Software canaries to detect failures in computer processors

2/19/2013 by Elise King, CSL Communications

CSL Professor Rakesh Kumar and University of Minnesota Assistant Professor John Sartori, a former CSL graduate student, have received a $300,000, 3-year grant from the National Science Foundation and the Semiconductor Research Corporation

Written by by Elise King, CSL Communications

CSL Professor Rakesh Kumar and University of Minnesota Assistant Professor John Sartori, a former CSL graduate student, have received a $300,000, 3-year grant from the National Science Foundation and the Semiconductor Research Corporation to research the use of software canaries in detecting hardware failures.

Kumar said the idea of a software canary can be explained by the analogy of canaries in mines. Miners would bring birds into mines to detect methane gas. When the canaries stopped singing, the miners knew the canaries had died and evacuated before they were harmed.

"The idea of a canary, in the context of processors, is that if you can build something inside a processor that will fail before your program fails, then it’s a good detector," said Kumar, an assistant professor of electrical and computer engineering. "It means that maybe you can dial down the speed, or you can dial up the voltage so that your programs can work correctly."

Traditionally, the industry has used hardware canaries to check for problems. However, “When you’re building a hardware warning system to check a different piece of hardware, you know it’s a question of who checks the checker,” Kumar said. The hardware canary will suffer from the same kinds of issues that the actual hardware would, so the system is very conservative.

However, a software canary can run on the same processor that the actual hardware runs on -- as opposed to running alongside it -- and therefore is less conservative and can potentially save power and enhance performance, Kumar said.

This grant is part of the NSF/SRC Joint Initiative in Failure Resistant Systems. “It’s a big program,” Kumar said. “They encourage impactful research in failure resistant systems … and essentially they are looking for cross-cutting solutions,” he said. Recently, Oracle Corp. also has taken interest in this specific research and is offering their machines to test the software canaries on.

Kumar and Sartori have worked on projects together in the recent past when Sartori was a graduate student at the University of Illinois in Urbana-Champaign in Kumar's group. Kumar said that he likes that this project gives them the opportunity to keep working together. “I’m also very happy with the fact that (Sartori) has something to support his research as soon he has started,” he said.


Share this story

This story was published February 19, 2013.