Milenkovic receives grant to develop codes for DNA data storage system


With our society’s ever-growing volume of data, DNA is emerging as a potential storage media of unprecedented density, durability, and efficiency.

Olgica Milenkovic
Olgica Milenkovic
CSL Professor Olgica Milenkovic and her team have been awarded a $500,000 grant from the National Science Foundation to continue their work on making DNA a viable data storage system. This grant will specifically investigate how to encode data in a manner suitable for portable and robust DNA-based data storage systems.

When putting data on any storage device, from a floppy disk to a DNA strand, a “coding language” needs to be created so that the system can write, read and store, and then translate information back without detrimental errors. To make DNA into a viable medium, researchers must create a unique data coding system suitable for writing via DNA synthesis and reading via high-throughput sequencing.

DNA molecules are built using four nucleotides: cytosine, guanine, adenine, and thymine (C, G, A, and T). These letters, along with specific association and context-based grammatical rules, become the language that Milenkovic uses to store data.

“One can make DNA strings that contain any desired information by arranging the A, T, G, C letters in a certain manner,” said Milenkovic, a professor of electrical and computer engineering. “If you encode the information with redundancy that helps in preventing and correcting errors, you can read it back without errors.”

To test the method, Milenkovic and her team recently stitched the data of a Citizen Kane poster and other images into the DNA. When they retrieved it without coding redundancy, the images were unrecognizable. When only 15% of redundancy was added, they came back perfectly intact—with no errors.

“If there are too many errors, we can’t read it back properly, and everything nature and human-made is prone to errors. So our strategy is to correct errors that arise at every level of the data recoding process: during synthesis and during DNA sequencing. And the errors are nothing we have seen before in classical communication systems,” said Milenkovic.

Though creating a viable storage system in DNA comes with challenges, the reward is great: a study has shown that if all the world’s data could be stored in DNA, it would fit in the trunk of a car.

Milenkovic is working with various industry partners to continue to develop this technology, including investigating the role nanopore sequencing technology could play in portable and accurate DNA coding and reading. Nanopore reading involves stringing the DNA through a tiny hole that reads the sequence content symbol-by-symbol. The nanopore reading technology is built into a low-cost and portable device.

“Our work represents the only known random access DNA-based data storage system that uses highly error-prone nanopore sequencers, while still producing error-free readouts with the highest reported information rate/density,” said Milenkovic.

As this recording technology centered around nanopore sequencing develops, it represents a crucial step toward the practical employment of DNA as a storage medium.