CSL graduate students work to overcome DNA storage challenges


Allie Arp, CSL

The world has a growing surplus of data. The key to storing this vast digital information may lie in the carrier of the genetic code – DNA. However, DNA, which offers many benefits in the form of nonvolatility and extreme storage density, carries a prohibitively high price tag and is less than perfect due to a high number of errors introduced during synthesis and sequencing.

Two CSL graduate students, Kasra Tabatabaei and Chao Pan, are working to overcome these challenges through “DNA punch cards” and a new chimeric-DNA methodology. Their research, which aims to reduce costs and error rates, may help make DNA storage a reality sooner rather than later. Both are students of CSL Professor Olgica Milenkovic, a professor of electrical and computer engineering, and are co-advised by Charles Schroeder from the Chemistry Department.

“The main reason to use DNA-based data storage is because DNA is an extremely dense and very durable medium for storing data,” said Tabatabaei, a graduate student in biophysics and quantitative biology. “If you store data on a flash drive, you can’t use it after 10 to 20 years. If you store data in DNA, you can retrieve it after centuries.”

Both projects have received news coverage. Tabatabaei’s work has appeared in Nature Communications and Scientific American, while Pan’s was covered by Integrated DNA Technologies.

DNA punch cards for storing data on native DNA sequences via enzymatic nicking
Tabatabaei is working to reduce costs and errors through a DNA “punch card” method that uses naturally occurring DNA that doesn’t have to be synthesized before being stored.
Kasra Tabatabaei
Kasra Tabatabaei

A DNA strand can be so long that it can’t be synthesized. With native DNA, these problems don’t exist because the researchers are encoding data, not through symbol-by-symbol methods, but through topological modifications in the backbone. The topology is altered by “nicking” DNA, which creates single bond breaks at predetermined positions on one or both DNA strands. The data can then be read from this nicked, or hole-punched DNA, through standard DNA sequencing.

“The DNA punch-card method we have developed uses native DNA which is any kind of naturally occurring DNA that you can extract from bacterial cells, human cells, or any kind of living organism or a virus so there is no need to synthesize DNA,” said Tabatabaei. “This substantially reduces the cost of information storage because synthesizing DNA is a very expensive procedure and has its own limitations.”

Future research on this project will include looking for additional applications for this storage method, including the ability to watermark DNA-stored information to protect secretive information.

“Our DNA punch-card method is the first work that also enables in-memory computations on information that is stored in DNA,” Tabatabaei said. “That is a huge leap because our system is not just an archival media but part of a computer -- the data you have stored can be directly operated on. Our DNA punch-card method is the first one that has this capability, in addition to being able to record information in parallel and securely erase it through strand ligation.”

An on-chip nanoscale storage system using chimeric DNA
Chao Pan
Chao Pan
Meanwhile, Pan is working to improve DNA-based image storage systems without using error-correction redundancy to reduce the synthesis cost.

“We’re proposing that we do not use complex coding redundancy which was commonly used before,” said Pan, an ECE graduate student. “Rather we use computer vision and machine learning methods to help us deal with issues, such as missing DNA blocks and other errors.”

In order to test the new coding scheme, Pan and Milenkovic used a unique medium of their shared interest, Marlon Brando movie posters. The duo encoded the images of posters into DNA, then decoded the information they stored and used specialized learning methods to automatically identify and correct potential errors in the DNA. Most of the reconstructed images didn’t have observable defects when compared to the original images. Pan believes this solution could be used for companies who currently have large-scale data storage systems containing many complex images.

An example of the movie posters used in Pan and Milenkovic's research, before, during, and after using the chimeric-DNA method
An example of the movie posters used in Pan and Milenkovic's research, before, during, and after using the chimeric-DNA method

The researchers’ next steps include continuing to improve the machine learning-based decoding technique.

“Biotechnology is moving very fast, so within five to 10 years or so, DNA may start to gradually replace some of the large-scale storage systems that are being used now,” said Pan. “Businesses like Facebook facilitate the production of huge amounts of pictures, videos, and texts every day. They can actually no longer afford to store all of this information, but if they can transfer it to DNA storage it can be stored in a very compact way.”