Information Technology Reference
In-Depth Information
1.2
Evaluation Criteria
Having defined what an operating system does, how should we choose among
alternative approaches to the design challenges posed by operating systems?
We next discuss several desirable criteria for operating systems. In many cases,
tradeos between these criteria are inevitable | improving a system along one
dimension will hurt it along another. We conclude with a discussion of some
concrete examples of tradeoffs between these considerations.
1.2.1
Reliability
Perhaps the most important characteristic of an operating system is its reliabil-
ity. Reliability is that a system does exactly what it is designed to do. As the
Denition: Reliability
lowest level of software running on the system, errors in operating system code
can have devastating and hidden effects. If the operating system breaks, the
user will often be unable to get any work done, and in some cases, may even lose
previous work, e.g., if the failure corrupts files on disk. By contrast, application
failures can be much more benign, precisely because operating systems provides
fault isolation and a rapid and clean restart after an error.
Making the operating system reliable is challenging. Operating systems often
operate in a hostile environment, where computer viruses and other malicious
code may often be trying to take control of the system for their own purposes by
exploiting design or implementation errors in the operating system's defenses.
Unfortunately, the most common ways for improving software reliability,
such as running test cases for common code paths, are less effective when applied
to operating systems. Since malicious attacks can target a specific vulnerability
precisely to cause execution to follow a rare code path, literally everything has to
work correctly for the operating system to be reliable. Even without malicious
attacks that trigger bugs on purpose, extremely rare corner cases can occur
regularly in the operating system context. If an operating system has a million
users, a once in a billion event will eventually occur to someone.
A related concept is availability , the percentage of time that the system is us-
Denition: availability
able. A buggy operating system that crashes frequently, losing the user's work,
is both unreliable and unavailable. A buggy operating system that crashes fre-
quently, but never loses the user's work and cannot be subverted by a malicious
attack, would be reliable but unavailable. An operating system that has been
subverted, but continues to appear to run normally while logging the user's
keystrokes, is unreliable but available.
Thus, both reliability and availability are desirable. Availability is affected
by two factors: the frequency of failures, called the mean time to failure (MTTF)-
, and the time it takes to restore a system to a working state after a failure (for
Definition: mean time to
failure (MTTF)
example, to reboot), the mean time to repair (MTTR). Availability can be im-
Definition: mean time to
repair (MTTR)
Search WWH ::




Custom Search