Adve, Team Awarded $5.6 Million to Streamline Complex Modern Software
CSL Professor and Interim CS Department Head Vikram Adve will lead a five-year, $5.6 million effort to reduce the complexity and size of modern software systems, using his groundbreaking work creating the LLVM compiler infrastructure as part of the project.
The Office of Naval Research awarded the grant to Adve and two co-PIs, University of Rochester Assistant Professor John Criswell, (BS CS ’03, PhD ’14 and one of Adve’s former students), and University of Utah Professor John Regehr.
ONR made the award through its Total Platform Cyber Protection program, or TPCP. With the award, ONR is seeking advances that improve the ever-growing mass of software used by government systems, allowing that software to be much more efficient, compact and secure.
“The Navy, when it comes to shipboard systems, they have enormous amounts of code that have to run on the computers on a ship as well as even greater volumes of software that’s run on land in control centers,” Adve said. “Much of this software is built on top of open-source commodity software. Because the practices that enable us to develop large software systems today are also really inefficient, they face really large costs in developing, maintaining, and testing their software.”
The project will allow Adve and his co-PIs to make use of one aspect of LLVM that he says was never fully explored – lifelong optimization. This capability allows software to be analyzed and transformed at any time before or after shipping to end-users, unlike much software today, which must be frozen prior to shipping.
Adve says he and his co-PIs and a group of eight to 10 students were already starting to work on the unexplored possibilities of lifelong optimization when ONR called for proposals. Their new project – an outgrowth of LLVM dubbed ALLVM -- seemed like a good fit.
“LLVM allows you to do these kinds of late-stage software customization or optimization even after shipping the code because you ship it in this richer form,” he said. “What we’re trying to explore now is, what benefits would you get for performance, for security, for reliability if all software on a system – hence the name ALLVM -- was available in a form that can be analyzed and optimized by compilers?”
Part of the project already underway involves building a database of all of the open source software Adve and his team can find that can be compiled by Clang, which is the C++ compiler for LLVM. Thousands of popular Linux packages are already available in this form.
Adve calls it a Bitcode database, referring to the intermediate language that LLVM uses to optimize apps.
One benefit of a Bitcode Database is to comb the database looking for duplicate fragments of code within programs and even across programs.
“Within a program you can eliminate redundancy, so you can make the program smaller,” Adve said. “One thing we’re looking at is reusing the common pieces so that when you update a program and ship a new version, you don’t have to have separate complete copies of the old version and the new version installed on a system.” For example, the project can ship ninedifferent versions of a single, widely used database engine in the same size as it currently takes for a single version!
Another benefit of ALLVM and the Bitcode database, which co-PI Regehr is exploring, is to use a powerful class of techniques called superoptimizers to convert sequences of code into shorter ones. A third benefit, the focus of co-PI Criswell’s work, is to improve security of the software – and be able to measure the security improvement obtained.
Beyond the work Adve, Regehr, Criswell and their team can manage on their own, the Bitcode database and associated tools will be open source and will be shared by anyone interested in accessing or contributing to it. Adve hopes that will allow other researchers to explore ideas in compilers, software engineering, security and software reliability that he and his partners may not even have considered.