Information Technology Reference
In-Depth Information
Recovery
three recovery managers. The default recovery
manager is a kernel service that simply unloads
the failed driver, leaving the system running but
without the services of the driver. The restart
recovery manager is a user-mode agent that simi-
larly unloads the failed driver but then executes a
script to reload and restart the driver. The shadow
recovery manager performs complete recovery
in the kernel, oblivious to applications and the
kernel itself. Shadow driver recovery is described
in more detail in the next section.
The XPC, object tracking, and protection do-
main code all provide interfaces to the recovery
managers. The XPC service allows a manager to
signal all the threads that are currently executing
in the driver or have called through the driver
and back into the kernel. The signal causes the
threads to unwind out of the driver by returning
to the point where they invoked the driver without
executing any additional driver code.
The object tracker provides an interface to
recovery managers to enumerate the objects in use
by a driver at the time of failure and to garbage
collect the objects by releasing them to the kernel.
The manager may choose both the set of objects
it releases and the order in which to release them.
Thus, it may preserve objects for use by the driver
after recovery, such as memory-mapped I/O buf-
fers that a hardware device continues to access.
Lightweight kernel protection domains provide
similar support for recovery. The domains record
the memory regions accessible to a driver and
provide interfaces for enumerating the regions
and for releasing the regions to the kernel.
The recovery code in Nooks consists of three
components. First, the isolation components detect
driver failures and notify the controller. Second,
the object tracker and protection domains support
cleanup operations that release the resources in
use by a driver. This functionality is available to
the third component, a recovery manager , whose
job is to recover after a failure. The recovery
manager may be customized to a specific driver
or class of drivers.
Failure Detection
Nooks triggers recovery when it detects a failure
through software checks (e.g., parameter valida-
tion or livelock detection), processor exceptions,
or notification from an external source. Specifi-
cally, the wrappers, protection domains, and object
tracker notify the Nooks isolation manager of a
failure when:
The driver passes a bad parameter to the
kernel, such as accessing a resource it had
freed or unlocking a lock not held.
The driver allocates too much memory,
such as an amount exceeding the physical
memory in the computer.
The driver executes too frequently without
an intervening clock interrupt (implying
livelock).
The driver generates an invalid processor
exception, such as an illegal memory ac-
cess or an invalid instruction.
Summary of Nooks
In addition, it is possible to implement an
external failure detector, such as a user- or kernel-
mode agent, that notifies the controller of a failure.
In all cases, the controller invokes the driver's
recovery manager.
Device drivers are a major source of failure in
modern operating systems. Nooks is a new re-
liability layer intended to significantly reduce
driver-related failures. Nooks isolates drivers in
lightweight kernel protection domains and relies
on hardware and software checks to detect failures.
After a failure, Nooks recovers by unloading and
Recovery Managers
The recovery manager is tasked with returning
the system to a functioning state. Nooks supports
Search WWH ::




Custom Search