In 2010, a young CSL assistant professor began a project building off a hypothesis proposed by another CSL professor. The result was a paper whose contributions would be recognized as the most influential findings in the following decade of design automation research.
Rakesh Kumar’s “Slack Redistribution for Graceful Degradation Under Voltage Overscaling" was announced as the Ten-Year Retrospective Most Influential Paper Award at the 25th annual Asia and South Pacific Design Automation Conference earlier this week. When Kumar first started this work, a considerable amount of design automation research was focused on the potential trade-offs between power and reliability in a circuit. But CSL Professor Janak Patel was skeptical, especially about the claimed tradeoffs between voltage and reliability. Patel believed that voltage reliability trade-offs couldn’t work because either everything worked in a circuit at high voltages or nothing worked at low voltages.
The first contribution of Kumar’s influential paper was confirming Patel’s hypothesis in context of high performance microprocessors.
“We showed quantitatively that when you reduce the voltage of a circuit, processors fail catastrophically,” said Kumar, currently an associate professor in CSL and electrical and computer engineering. “This was influential because it encouraged people to reexamine their assumptions about how processors behaved in response to voltage changes.”
Building off of the first contribution, the second contribution of the paper involved Kumar and his team of collaborators showing that while Patel’s hypothesis was correct, it’s possible to reshape circuits to prevent a catastrophic failure. Their technique, called “power aware slack redistribution,” reworks circuits so that the number of errors increased gradually rather than all at once. This led to additional work in processor design for applications that can tolerate, and even correct, errors. This also led to additional work on processors wherein the number of errors, in response to voltage reduction, did not overwhelm the error correction mechanism supported in these processors.
For example, if an application can tolerate 3% errors, Kumar and team could design a processor with reliability characteristics that matched the reliability requirements of the application. Alternatively, they could design a processor that would keep the number of errors below the limit of the error correction mechanism.
“This second contribution was significant because we created a grey area for dealing with errors, so now you can get the power reliability trade-offs,” said Kumar. “It told people that as long as you’re clever about how you tolerate or correct these errors you can make it work.”
The combination of the first and second contributions significantly impacted how people thought about interaction between reliability and hardware design, and spawned a new direction for researchers. The design methodology built and used by Kumar and his team can be applied to the research of others to understand and quantify application specific error rates, optimizing a previously cumbersome process.
In addition to being influential, the paper was one of the earliest in a line of work from Kumar that produced two PhD students who have now gone on to become professors in their own right. Kumar believes support for this type of delayed-gratitude research is what makes Illinois unique.
“The award is really a reflection of how Illinois and the departments of CSL/ECE support research that may not have short-term or immediate pay-offs but has the potential for long-term impact,” said Kumar. “Illinois has always had a strong tradition of such research. I’m glad we were able to add to it in some small way.”
This is a joint work work John Sartorii (now at the University of Minnesota), Seokhyeong Kang (now at POSTECH), and Andrew Kahng (UCSD).