Information Technology Reference
In-Depth Information
then reloading the failed driver. Nooks focuses
on achieving backward compatibility , that is, it
sacrifices complete isolation and fault tolerance
for compatibility and transparency with existing
kernels and drivers. As a result, Nooks has the po-
tential to greatly improve the reliability of today's
operating systems by removing their dependence
on driver correctness.
from a disk driver used for swapping, it assumes
that the device is faulty and crashes the system.
In addition, any settings an application or the
OS had downloaded into a driver are lost when
the driver restarts. Thus, even if the application
reconnects to the driver, the driver may not be
able to process requests correctly.
These weaknesses highlight a fundamental
problem with a recovery strategy that reveals
driver failures to their clients: the clients may
not be prepared to handle these failures. Rather,
they are designed for the more common case that
either drivers never fail, or, if they fail, the whole
system fails.
To address these problems, shadow drivers
are a transparent recovery mechanism for driver
failures. Their design for shadows reflects three
principles:
Shadow driver recovery
Isolation techniques can reduce the frequency
of system crashes, but applications using the
failed driver may continue to crash. Applications
receive erroneous results following a failure, and
the driver loses application state when it restarts.
Most applications are unprepared to cope with
either situation. Rather, applications reflect the
conventional failure model: drivers and the oper-
ating system either fail together or not at all. The
restart recovery manager recovers from driver
failure by unloading and then reloading the failed
driver. However, reloading failed drivers is effec-
tive at preventing system crashes. However, users
of a computer are not solely interested in whether
the operating system continues to function. Of-
ten, users care more about the applications with
which they interact. If applications using drivers
fail, then I have only partially achieved my goal
of improving reliability.
With the restart recovery manager, calls into
a driver that fails and subsequently recovers may
return error codes because the recovery manager
unloads the driver and invalidates open connec-
tions to the driver during recovery. As a result,
clients of a recovered driver would themselves
fail if they depend on the driver during or after
recovery. For example, audio players stopped
producing sound when a sound-card driver failed
and recovered. For the same reason, the restart
recovery manager cannot restart drivers needed
by the kernel, such as disk drivers. Requests to
the disk driver fail while the driver is recovering.
When the Linux kernel receives multiple errors
Device driver failures should be concealed
from the driver's clients. If the operating
system and applications using a driver can-
not detect that it has failed, they are unlike-
ly to fail themselves.
Driver recovery logic should be generic.
Given the huge number and variety of de-
vice drivers, it is not practical to implement
per-driver recovery code. Therefore, the
architecture must enable a single shadow
driver to handle recovery for a large num-
ber of device drivers.
Recovery services should have low over-
head when not needed. The recovery sys-
tem should impose relatively little over-
head for the common case (that is, when
drivers are operating normally).
Overall, these design principles aim to protect
applications and the OS from driver failure, while
minimizing the cost required to make and use
shadow drivers.
Shadow drivers only apply to device driv-
ers that belong to a class and share a common
calling interface. They recover after a failure by
Search WWH ::




Custom Search