Reliable Storage - Operating Systems: Principles and Practice

Information Technology Reference

In-Depth Information

with unrecoverable read errors, reconstructing the lost data from the remaining

disks in the RAID array, and attempting to write and read the reconstructed

data to and from the suspect sector. If writes and reads succeed, then the error

was caused by a transient fault, and the disk continues to use the sector, but

if the sector cannot be successfully accessed, the error is permanent, and the

system remaps that sector to a spare and writes the reconstructed data there.

Reducing nonrecoverable read error rates with more reliable disks.

Different disk models promise significantly different nonrecoverable read error

rates. In particular, in 2011, many disks aimed at laptops and personal comput-

ers claim unrecoverable read error rates of one per 10 14 bits read, while disks

aimed at enterprise servers often have lower storage densities but can promis

unrecoverable read error rates of one per 10 16 bits read. This two order of mag-

nitude improvement greatly reduces the probability that a RAID system loses

data from a combination of a full disk failure and a nonrecoverable read error

during recovery.

Reducing mean time to repair with hot spares. Some systems include

\hot spare" disk drives that are idle, but plugged into a server so that if one of

the server's disks fails, the hot spare can be automatically activated to replace

the lost disk.

Note that even with hot spares, the mean time to repair a disk is limited by

the time it takes to write the reconstructed data to it, and this time is often

measured in hours. For example, if we have a 1 TB disk and can write at

100 MB/s, the mean time to repair for the disk will be at least 10 4 seconds|

about 3 hours. In practice, repair time may be even larger if the bandwidth

achieved is less than assumed here.

Reducing mean time to repair with declustering. Disks with hundreds

of gigabytes to a few terabytes can take hours to fully write with reconstructed

data. Declustering splits reconstruction of a failed disk across multiple disks.

Definition: declustering

Declustering thus allows parallel reconstruction, thus speeding up reconstruction

and reducing MTTR.

For example, the Hadoop File System (HDFS) is a cluster file system that

writes each data block to three out of potentially hundreds or thousands of

disks. It chooses the three disks for each block more or less randomly. If

one disk fails, it rereplicates the lost blocks approximately randomly across

the remaining disks. If we have N disks each with a bandwidth of B, total

reconstruction bandwidth can approach N 2 B; for example, if there are 1000

disks with 100 MB/s bandwidths, reconstruction bandwidth can theoretically

approach 500 GB/s, allowing rereplication of a 1 TB disk's data in a few seconds.

In practice, rereplication will be slower than this for at least three reasons.

First, resources other than the disk (e.g., the network) may bottleneck recovery.

Search WWH ::

Custom Search

Home