Hardware Reference
In-Depth Information
The major consequence of this approach is that the control software produced by em-
pirical techniques can be highly unpredictable. If all critical time constraints cannot
be verified a priori and the operating system does not include specific mechanisms
for handling real-time tasks, the system could apparently work well for a period of
time, but it could collapse in certain rare, but possible, situations. The consequences
of a failure can sometimes be catastrophic and may injure people, or cause serious
damages to the environment.
A high percentage of accidents that occur in nuclear power plants, space missions, or
defense systems are often caused by software bugs in the control system. In some
cases, these accidents have caused huge economic losses or even catastrophic conse-
quences, including the loss of human lives.
As an example, the first flight of the space shuttle was delayed, at considerable cost,
because of a timing bug that arose from a transient overload during system initializa-
tion on one of the redundant processors dedicated to the control of the aircraft [Sta88].
Although the shuttle control system was intensively tested, the timing error was not
discovered. Later, by analyzing the code of the processes, it was found that there was
only a 1 in 67 probability (about 1.5 percent) that a transient overload during initial-
ization could push the redundant processor out of synchronization.
Another software bug was discovered on the real-time control system of the Patriot
missiles, used to protect Saudi Arabia during the Gulf War. 1 When a Patriot radar
sights a flying object, the onboard computer calculates its trajectory and, to ensure
that no missiles are launched in vain, it performs a verification. If the flying object
passes through a specific location, computed based on the predicted trajectory, then
the Patriot is launched against the target, otherwise the phenomenon is classified as a
false alarm.
On February 25, 1991, the radar sighted a Scud missile directed at Saudi Arabia, and
the onboard computer predicted its trajectory, performed the verification, but classified
the event as a false alarm. A few minutes later, the Scud fell on the city of Dhahran,
causing injuries and enormous economic damage. Later on, it was discovered that, be-
cause of a long interrupt handling routine running with disable interrupts, the real-time
clock of the onboard computer was missing some clock interrupts, thus accumulating
a delay of about 57 microseconds per minute. The day of the accident, the computer
had been working for about 100 hours (an exceptional situation never experienced
before), thus accumulating a total delay of 343 milliseconds. Such a delay caused a
prediction error in the verification phase of 687 meters! The bug was corrected on
1 L'Espresso , Vol. XXXVIII, No. 14, 5 April 1992, p. 167.
Search WWH ::




Custom Search