CSL professor explores DNA as data storage
“Due to its longevity and enormous information density, DNA has become an attractive medium for long-term archival storage,” says Shomorony, an assistant professor of electrical and computer engineering. “However, this new approach to data storage has unique properties.”
The idea of storing data in DNA has been around since the 1960s but has only recently become realistic. Major developments in the technologies that synthesize and sequence (read and write) DNA mean that this process is more affordable – though still not cheap.
Storing data this way requires the synthesis of a large number of short DNA molecules that are then mixed out of order in a liquid solution. This makes this process of storing data very expensive and the process of retrieving data computationally complex. The goal of Shomorony’s research is to develop tools to efficiently write and read data from this out-of-order medium. Specifically, the team will characterize fundamental tradeoffs between information density, reading and writing speeds, and reliability.
“The idea of utilizing DNA, the molecule that carries the genetic information of all living beings, to store generic digital data, such as images and audio, is fascinating to me,” says Shomorony. “It combines two of my main research interests: information theory and genomics.”
Current storage media only guarantees data lifetimes of a few years; however, research institutes often have an interest in guaranteeing that data being generated right now will still be readable decades or even centuries from now. We are also producing data at a faster rate than ever before and all of it has to go somewhere – just think of how fast you take up the storage on your home computer.
“DNA has been shown to be able to store in the order of exabytes (1018 bytes) of data per gram,” says Shomorony. “Accelerated aging experiments show that data can be reliably archived on DNA for many millennia. Making this technology a reality will benefit any institution that requires long-term reliable storage of very large amounts of data.”
Shomorony and his team also seek to develop a more general theory for data storage and communication using “unordered” media. The techniques developed in this research will be applicable to any scenario that involves recovering data that has been shuffled out of order. Therefore, this work is applicable to DNA-based computation and is connected to more standard digital communication applications such as data networks, where packets may be out of order or corrupted.
The future use of DNA storage technologies relies on the development of more cost-effective solutions. Shomorony expects that researchers will develop cheaper synthesis and sequencing techniques. The DNA currently utilized is synthesized from scratch, but it’s expected that the DNA from living organisms can be used instead, providing a cheaper alternative to full synthesis.
This research is funded by the NSF (CISE).