Varshney awarded DOE grant for data reduction research
Researchers and scientists deal in data to advance their work. However, combing through mountains of it can be an issue. Too much, and it becomes difficult to stream, store, and analyze. Five universities, along with five Department of Energy (DOE) National Laboratories across eight states, are working on solving that problem through a series of $13.7 million projects.
Among them is CSL and ECE Professor Lav Varshney, who was awarded $332,736 of those funds from the DOE to work on 'Objective-Driven Data Reduction for Scientific Workflows' via machine learning.
"There's massive amounts of data that are being collected by a variety of scientific instruments at the national labs," said Varshney. "Not all of that is relevant or useful for the scientific purpose. And so the question is, can you reduce the data down, so that it preserves all of the useful information, but it dissipates all the irrelevant information?"
To answer that question, one of the key ideas Varshney and his team of students are looking at is symmetries in data, and what could be irrelevant.
"Just to take a simple example, if I'm trying to learn how biased a coin is, I can keep flipping it and seeing how many heads come up. I can estimate the bias by the fraction of heads," said Varshney. "It's just the number of heads that matters, not the order of heads and tails. And that insight allows you to reduce the amount of information you store quite considerably."
Varshney said they'll be investigating two broad approaches to learn symmetries. One is information lattice learning, which was developed by one of Varshney’s former Ph.D students, Haizi Yu. Researchers create a lattice of possible symmetries, and then use a statistical learning algorithm to take data and figure out which symmetries are present in the data set.
"What comes out is human interpretable concepts, and then you can focus on representing in terms of those concepts rather than in terms of the raw data," said Varshney.
One of his students, Sourya Basu, has been working on another approach based on neural networks, specifically autoequivariant deep learning. This involves finding symmetries in data and incorporating them into the architecture of neural networks. Varshney said this helps both with data compression and processing.
While this project may focus on the technological burden of sifting through data, there's another benefit to it as well.
"There's the human side of it," said Varshney. "Information overload, it's not just technology, but humans as well. And so if we can reduce the amount of information that humans need to deal with, that helps with scientific discovery and scientific insight,"
Varshney is working alongside the Brookhaven National Laboratory as part of this project over the next three years.