Research to improve reliability of advanced weapon, other computer systems

2/19/2013 April Dahlquist

CSL researchers Zbigniew Kalbarczyk and Ravi K. Iyer have received a four-year, $300,000 grant from the Defense Threat Reduction Agency (DTRA) for their project entitled, “Globally-Optimization Protection for Soft Error Resilient Robust.”

Written by April Dahlquist

CSL researchers Zbigniew Kalbarczyk and Ravi K. Iyer have received a four-year, $300,000 grant from the Defense Threat Reduction Agency (DTRA) for their project entitled, “Globally-Optimization Protection for Soft Error Resilient Robust.”

Zbigniew Kalbarczyk (left) and Ravi K. Iyer
Zbigniew Kalbarczyk (left) and Ravi K. Iyer
Zbigniew Kalbarczyk (left) and Ravi K. Iyer

The project aims to improve the reliability of computer systems, including advanced weapons, by ensuring systems are still functional when “soft errors” occur. Soft errors are often caused by slight environmental changes, such as a temperature rise or the presence of radiation particles, causing temporary misbehavior in the system or application. The project will work on designing techniques to protect the system during these short disturbances.

“If there is a misbehavior or failure in the computing device, which controls the weapons system, then we ensure that added protection can compensate for it, guaranteeing that the system is reliable even in the presence of errors,” Kalbarczyk said.

The team’s primary goal is to create a system that repairs itself when errors occur. But in the event that it can’t do so, researchers are creating a back-up method that enables the system to self-diagnosis problems and send an error report, preventing any decision-making based on invalid data.

“We want to certify that we can protect and provide quantifiable, measurable results to show the system will survive in the presence of accidental errors,” Kalbarczyk said.

The team will look at how to make the system robust at the chip, software and application levels. By placing protection mechanisms throughout the different levels of the system, researchers hope to achieve cost-effective and dependable solutions. It is more efficient to make the system robust at each level than make the system robust at its completion, Kalbarczyk said.

As systems are increasingly produced at smaller scales, they become more sensitive and prone to errors. An important initial task for Kalbarczyk is creating the tools for validating the effectiveness of the proposed techniques.

Kalbarczyk is also looking at the tradeoff between the cost to produce a robust system and the benefit it will bring.

The work could be applied to any type of computing-based systems, such as cyber infrastructure for the power grid, telecommunications and other critical environments.

Other collaborators include Subhasish Mitra of Stanford University and Klas Lilja of California-based Robust Chip Inc.


Share this story

This story was published February 19, 2013.