Information Technology Reference
In-Depth Information
rates, and (3) reduce mean time to repair. All of these approaches, in various
combinations, are used in practice.
Here are some common approaches:
Increasing redundancy with more redundant disks. Rather than having
a single redundant block per group (e.g., using two mirrored disks or using one
parity disk for each stripe) sysems can use double redundancy (e.g., three disk
replicas or two error correction disks for each stripe.) In some cases, systems
may use even more redundancy. For example, the Google File System (GFS)
is designed to provide highly reliable and available storage across thousands of
disks; by default GFS stores each data block on three different disks.
A dual redundancy array is sometimes called RAID 6 . To ensure that data
Denition: dual
redundancy array
Definition: RAID 6
can be reconstructed despite any two failures in a stripe, error blocks are gen-
erated using erasure codes such as Reed Solomon codes.
A system with dual redundancy can be much more reliable than a simple
single redundancy RAID. With dual redundancy, the most likely data loss sce-
narios are (a) three full-disk failures or (b) a double-disk failure combined with
one or more nonrecoverable read errors.
If we optimistically assume that failures are independent and occur at a
constant rate, a system with two redundant disks per stripe has a potentially
low combined data loss rate:
N
MTTF
MTTR(G 1)
MTTF
( MTTR(G 2)
MTTF
FailureRate dual+indep+const =
+ P failrecoveryread )
This data loss rate is nearly MTTF
MTTR(G1) times better than the single-parity
data loss rate; for disks with MTTFs of over one million hours, MTTRs of
under 10 hours, and groups sizes of ten or fewer disks, double parity improves
the estimated rate by about a factor of 10,000.
We emphasize, however, that the above equation almost certainly underes-
timates the likely data loss rate for real systems, which may suffer correlated
failures, varying failure rates, higher failure rates than advertised, and so on.
Reducing nonrecoverable read error rates with scrubbing. A storage
device's sector-level error rates are typically expressed as a single nonrecoverable
read rates, suggesting that the rate is constant. The reality is more complex.
Depending on the device, errors may accumulate over time and heavier work-
loads may increase the rate that errors accumulate.
An important technique for reducing a disk's nonrecoverable read rate is
scrubbing : periodically reading the entire contents of a disk, detecting sectors
Denition: scrubbing
 
Search WWH ::




Custom Search