New computation tool to make DNA analysis more affordable, efficient

6/4/2020 Allie Arp, CSL

Written by Allie Arp, CSL

In the near future, scientific advancements will enable personalized medicine based on an individual’s DNA. The expense of DNA sequencing has become much more affordable - around $1,000 per sample. However, DNA analysis can still cost as much as $100,000, making the technology out of reach for most people. CSL student Subho Banerjee is conducting research to make this process more efficient and affordable.

Banerjee is applying the skills he learned in the research group of his adviser, Ravishankar K. Iyer, who is driving an initiative between Illinois and the Mayo Clinic in computational

genomics.

“This work focuses on bringing machine learning techniques and system innovations into genomic research said Iyer, CSL professor and the George and Ann Fisher Distinguished Professor of Engineering. “This research has significant implications for the diagnosis and treatment of disease”.

Banerjee has developed a tool called the Computation Engine Symphony, which leverages hardware accelerators in the computational genomic workflows that are used in variation detection and genotyping. The “accelerators” are dedicated processors that enable the team to perform the same computations up to 89x faster while consuming up to 2.8x less power.

“We studied a set of commonly used computational genomics analysis tasks and found that there are common mathematical ‘kernels’ that are repeatedly used across these analyses. We accelerated these kernels by building dedicated processors for them,” said Banerjee, a PhD student in computer science. “When we started this process, the workflows would take 70 hours on the Blue Waters petascale supercomputer (in Illinois’ National Center for Supercomputing Applications); today on a system with our accelerators, we can do it in 40 minutes.”

Once the data has been analyzed it can be used to see if a person is perceptible to diseases or drugs by comparing it to other human references. One of the applications of this research is to personalize treatment for major depressive disorder. When a person’s genome is analyzed the genetic markers can be used to develop a prognosis and then to prescribe the best treatments.

“Rapid analysis of genomic data plays a vital role in realizing the promise of personalized and precision medicine offering dramatic improvements in both diagnosis and treatment of a wide range of health conditions,” said Iyer, professor of electrical and computer engineering.

While incorporating accelerators have several benefits, their use in scientific applications such as computational genomics make the overall programming of such emergent computer systems -- which use traditional processors and novel accelerators -- difficult because of the added complexity. To address this problem, Banerjee and his student collaborators in Iyer’s research group worked to build a domain-driven reinforcement learning model, called Symphony, for scheduling accelerated workloads on large classes of heterogeneous processors that may include CPUs, GPUs, and FPGAs.

“Subho has designed a data-driven machine learning system that integrates real-time performance measurements, prior knowledge about workloads, and information about the system to provide better resource utilization and scheduling,” said Iyer. “The result is a much more efficient processing system which ultimately saves time and money. The savings make the analysis more feasible in a clinical setting.”

The group plans to extend this work to other precision medicine applications including real-time analysis of epilepsy data and other neurodegenerative diseases that are being tackled by fellow students in Iyer’s group. By building low-power, high-performance accelerators for these emerging applications, and allowing them to be used in larger scientific workflows using the Symphony scheduler, can potentially bring more personalized medicine to more people, Banerjee says.

A paper on this research was recently accepted for presentation at the International Conference on Machine Learning and will officially go into print next month. This research is supported by the National Science Foundation, IBM, Intel and Xilinx.

Share this story

This story was published June 4, 2020.