Java Reference
In-Depth Information
SOFTWARE FAILURE
LA Air Traffic Control
What Happened?
At about 5 pm on Tuesday, September
14, 2004, the Los Angeles air traffic-
control center suddenly lost voice con-
tact with 400 planes they were tracking
in the southwestern United States. The
Voice Switching and Control System
(VSCS), designed by the Harris Corp.
of Melbourne, Florida, had unexpect-
edly shut down. Then the backup system
designed to take over when such a failure
occurred crashed within a minute after it
was activated. Without the controller's
guidance, planes began coming danger-
ously close to each other, resulting in
several near misses.
Collisions were avoided due in part to
the quick thinking of some controllers, who used their cell phones to alert other
traffic-control centers and the airlines. But the main reason the incident wasn't a
disaster was the on-board collision avoidance systems now found in commercial
jets. These systems track the transponders of nearby aircraft and give emergency
instructions to the pilots to climb or descend at the last minute. It's likely that
several midair collisions would have resulted if the problem had occurred 10 to
15 years earlier, before planes had such avoidance systems.
Air traffic con-
trollers in action,
using voice and
imaging systems.
What Caused It?
Officially, the incident was blamed on human error. The FAA reported that the
problem was “not the result of system reliability” and would have been avoided
if FAA procedures had been followed. The key procedure in this case requires that
the voice switching system be rebooted every 30 days.
The root cause, however, was traced to a software problem. The VSCS relies on
a subsystem that periodically runs built-in tests. A countdown timer in the sub-
system is used to determine when the tests will run. The timer counts down in
milliseconds, starting at the highest numeric value that the system could handle:
232. That's just over four billion milliseconds. It takes just under 50 days for the
timer to go from 232 down to zero. Unfortunately, when the timer reaches zero,
the tests cannot be run, and the system shuts down. By rebooting the system every
30 days, the timer is reset almost three weeks before it expires.
441
 
Search WWH ::




Custom Search