Information Technology Reference
In-Depth Information
A Key RAID Consideration
One downside to RAID 5 is that only one drive can fail in the RAID set. If another drive fails before
the failed drive is replaced and rebuilt using the parity data, data loss occurs. h e period of exposure
to data loss because of the second drive failing should be mitigated.
h e period of time that a RAID 5 set is rebuilding should be as short as possible to minimize the
risk. h e following designs aggravate this situation by creating longer rebuild periods:
Very large RAID groups (think 8+1 and larger), which require more reads to reconstruct the
failed drive.
Very large drives (think 1 TB SATA and 500 GB Fibre Channel drives), which cause more data
to be rebuilt.
Slower drives that struggle heavily during the period when they are providing the data to rebuild
the replaced drive and simultaneously support production I/O (think SATA drives, which tend
to be slower during the random I/O that characterizes a RAID rebuild). h e period of a RAID
rebuild is actually one of the most stressful parts of a disk's life. Not only must it service the
production I/O workload, but it must also provide data to support the rebuild, and it is known
that drives are statistically more likely to fail during a rebuild than during normal duty cycles.
h e following technologies all mitigate the risk of a dual drive failure (and most arrays do various
degrees of each of these items):
Using proactive hot sparing, which shortens the rebuild period substantially by automatically
starting the hot spare before the drive fails. h e failure of a disk is generally preceded with read
errors (which are recoverable; they are detected and corrected using on-disk parity informa-
tion) or write errors, both of which are noncatastrophic. When a threshold of these errors
occurs before the disk itself fails, the failing drive is replaced by a hot spare by the array. h is
is much faster than the rebuild after the failure, because the bulk of the failing drive can be
used for the copy and because only the portions of the drive that are failing need to use parity
information from other disks.
Using smaller RAID 5 sets (for faster rebuild) and striping the data across them using a higher-
level construct.
Using a second parity calculation and storing this on another disk.
As described in the sidebar “A Key RAID 5 Consideration,” one way to protect against data
loss in the event of a single drive failure in a RAID 5 set is to use another parity calculation. This
type of RAID is called RAID 6 (RAID-DP is a RAID 6 variant that uses two dedicated parity
drives, analogous to RAID 4). This is a good choice when large RAID groups and SATA
are used.
Figure 6.5 shows an example of a RAID 6 4+2 coni guration. The data is striped across four
disks, and a parity calculation is stored on the i fth disk. A second parity calculation is stored on
another disk. RAID 6 rotates the parity location with I/O, and RAID-DP uses a pair of dedicated
parity disks. This provides good performance and good availability but a loss in capacity efi -
ciency. The purpose of the second parity bit is to withstand a second drive failure during RAID
rebuild periods. It is important to use RAID 6 in place of RAID 5 if you meet the conditions
noted in the previous sidebar and are unable to otherwise use the mitigation methods noted.
Search WWH ::




Custom Search