Signals, Inference, and Networks
How do we communicate information efficiently between points in a network, recover useful information from noisy measurements, make critical decisions and predictions from large volumes of unstructured data, including biological data, and perform all of these tasks while ensuring appropriate levels of information security and privacy? Researchers in the SINE group explore these questions, drawing on tools from information theory, probability, statistics, optimization, and machine learning.
Data Science and Machine Learning
Increasingly, both routine and critical decisions and predictions are made by sophisticated algorithms on the basis of large volumes of data. Salient examples include credit scoring, product recommendation systems, health care analytics, and financial forecasting; emerging technologies, such as self-driving cars, also rely heavily on fast, reliable, and safe algorithmic decision-making. Further, data can be used not just to reason about the world but create ideas and artifacts that have never been imagined before. Moreover, social and physical scientists are embracing machine learning and data analytics to contend with the massive datasets that are quickly becoming the norm in their research. How can we guarantee that the quality of predictions and decisions made by these algorithms continues to improve, while being able to store, summarize, and manipulate data at scale? How can we control overfitting when the same dataset is reused by multiple interconnected learning algorithms? How can we balance conflicting demands of predictive performance and individual or institutional privacy? Research efforts in the area of Data Science and Machine Learning focus on developing mathematical and algorithmic tools to address many of these problems.
Statistical Signal Processing
A great variety of algorithms have been developed to process and analyze a wide range of signals of interest. Examples include multimedia (speech, music, images, video), geophysical and biomedical signals. In addition to such "natural" signals, a variety of other “man-made” signals (such as flows in computer networks, radar or communication waveforms) also contain information of great interest. Modern applications require development and implementation of highly accurate and sophisticated methods for such purposes. Statistical signal processing methods provide a principled and systematic framework for developing high-performance algorithms and understanding their fundamental limits of performance. Sometimes people are used to process signals, through crowdsourcing. Research in this area involves characterizing and learning the structural and statistical properties of the signals and the sensors that acquire them, and applying fundamental theory from statistical inference and estimation theory. Tasks of interest often include object detection or classification, parameter and model estimation, and signal reconstruction from limited, noisy measurements, as well as various means for signal compression.
Cross-disciplinary Biological Data Analysis
Recent advances in nanotechnology, material science and bioengineering have made it possible to acquire unprecedented amounts of data elucidating complex molecular and cellular functions and interactions. In order to fully exploit the information contained in genomics and neuroscience data, one has to ensure adequate data distribution and maintenance through specialized compression methods and redundant, secure cloud storage mechanisms. These issues create new challenges in information theory, computer science, bioinformatics and computational biology alike. At the same time, one has to perform data-driven information extraction/denoising, statistical analysis, algorithmic inference and model validation at scale. The aforementioned processing tasks create unique new research questions at the intersection of machine learning and signal processing, and are expected to advance our understanding of inheritance, evolution, disease onset and progression, behavior and cognition.
Representative research topics in this area include genomic data compression, compressive computing, connectomics, molecular imaging, base calling, sequence alignment and assembly, secondary and tertiary structure prediction, inverse engineering of gene regulatory networks, causal inference, driver genes community discovery and evaluation of physical contact maps.
Network Science and Engineering
Many natural and engineered systems can be viewed as networks of interacting entities or agents. Examples include communication networks, biological networks, and social networks. In What what ways does the network structures of the Internet, Facebook, and gene regulatory networks look like? How should one design algorithms to analyze and control such networks? The goal of the research in this area is to answer such questions by developing a common set of mathematical and experimental tools to study large networks, as well as to develop algorithms for efficient operation of specific types of networks.
Communications and Coding
How can we send information from one point in space to another, or from one point in time to another (the later is known as storage of information) at high speeds with high accuracy? The demand for communication is highly variable and the communication medium, be it through air, under water, through the body, or a charge coupled device (CCD) for storage, is also highly variable. Distributed control algorithms, known as protocols, and uses of redundancy, to allow correction or detection of errors, are key tools for communication system design. A diversity of new wireless applications, including the Internet of Things (IoT) is placing increased attention on high density communication, sometimes with response times on the order of microseconds.
Security and Privacy
Data analytics is a rapidly growing field aided by the prevalence of huge amounts of data as a result of digitalization and availability of significant computing power. Our increased ability in performing statistical modeling and predictive analytics has significant implications on security and privacy. For instance, it allows us to perform traffic analysis in networks. Such ability can provide tools to secure the network (e.g., allow learning the source of an attack through network flow linking), but also it can cause privacy bridges (e.g, allow for linking users in anonymous networks). Research effort in this area focuses on characterizing fundamental limits to security and privacy problems in presence of attackers with various degrees of computational power and access to information.