Device Driver Reliability - Advanced Operating Systems and Kernel Applications

Information Technology Reference

In-Depth Information

restarting the driver and replaying past requests

and hence, can only recover from failures that are

both transient and fail-stop. Deterministic failures

may recur when the driver recovers, again causing

a failure. Recoverable failures must be fail-stop,

because shadow drivers must detect a failure in

order to conceal it from the OS and applications.

Hence, shadow drivers require an isolation sub-

system to detect and stop failures before they are

visible to applications or the operating system.

Once the driver has restarted, the active-mode

shadow returns the driver to its pre-failure state.

For example, the shadow re-establishes any

configuration state and then replays pending re-

quests. Shadow drivers rely on the state machine

model of drivers. Whereas the default and restart

recovery managers seek to restore the driver to

its unloaded state or initialized state, shadow

drivers seek to restore drivers to their state at the

time of failure.

A shadow driver is a class driver , aware of the

interface to the drivers it shadows but not of their

implementations. The class orientation has two

key implications. First, a single shadow driver

implementation can recover from a failure of any

driver in its class, meaning that a handful of dif-

ferent shadow drivers can serve a large number of

device drivers. As previously mentioned, Linux,

for example, has only 20 driver classes. Second,

implementing a shadow driver does not require

a detailed understanding of the internals of the

drivers it shadows. Rather, it requires only an

understanding of those drivers' interactions with

the kernel. Thus, they can be implemented by

kernel developers with no knowledge of device

specifics and have no dependencies on individual

drivers. For example, if a new network interface

card and driver are inserted into a PC, the exist-

ing network shadow driver can shadow the new

driver without change. Similarly, drivers can be

patched or updated without requiring changes to

their shadows.

Shadow Driver Operation

Shadow drivers execute in one of two modes: pas-

sive or active . Passive mode is used during normal

(non-faulting) operation, when the shadow driver

monitors all communication between the kernel

and the device driver it shadows. This monitoring

is achieved via replicated procedure calls, called

taps : a kernel call to a device driver function causes

an automatic, identical call to the corresponding

shadow driver function. Similarly, a driver call to

a kernel function causes an automatic, identical

call to a corresponding shadow driver function.

These passive-mode calls are transparent to the

device driver and the kernel and occur only to track

the state of the driver as necessary for recovery.

Based on the calls, the shadow tracks the state

transitions of the shadowed device driver.

Active mode is used during recovery from a

failure. Here, the shadow performs two functions.

First, it impersonates the failed driver, intercept-

ing and responding to calls for service. Therefore,

the kernel and higher-level applications continue

operating as though the driver had not failed. Sec-

ond, the shadow driver restarts the failed driver

and brings it back to its pre-failure state. While

the driver restarts, the shadow impersonates the

kernel to the driver, responding to its requests

for service. Together, these two functions hide

recovery from the driver, which is unaware that

a shadow driver is restarting it after a failure, and

from the kernel and applications, which continue

to receive service from the shadow.

Taps

As previously described, a shadow driver moni-

tors communication between a functioning driver

and the kernel and impersonates one to the other

during failure and recovery. This is made possible

by a new mechanism, called a tap . Conceptually,

a tap is a T-junction placed between the kernel

and its drivers. It is implemented as a callout

from wrapper stubs. During a shadow's passive-

mode operation, the tap: (1) invokes the original

Advanced Operating Systems and Kernel Applications

Search WWH ::

Custom Search

Home