Creating and Configuring Storage Devices - Mastering VMware vSphere 5.5

Information Technology Reference

In-Depth Information

A Key RAID Consideration

One downside to RAID 5 is that only one drive can fail in the RAID set. If another drive fails before

the failed drive is replaced and rebuilt using the parity data, data loss occurs. h e period of exposure

to data loss because of the second drive failing should be mitigated.

h e period of time that a RAID 5 set is rebuilding should be as short as possible to minimize the

risk. h e following designs aggravate this situation by creating longer rebuild periods:

Very large RAID groups (think 8+1 and larger), which require more reads to reconstruct the

failed drive.

◆

Very large drives (think 1 TB SATA and 500 GB Fibre Channel drives), which cause more data

to be rebuilt.

◆

Slower drives that struggle heavily during the period when they are providing the data to rebuild

the replaced drive and simultaneously support production I/O (think SATA drives, which tend

to be slower during the random I/O that characterizes a RAID rebuild). h e period of a RAID

rebuild is actually one of the most stressful parts of a disk's life. Not only must it service the

production I/O workload, but it must also provide data to support the rebuild, and it is known

that drives are statistically more likely to fail during a rebuild than during normal duty cycles.

h e following technologies all mitigate the risk of a dual drive failure (and most arrays do various

degrees of each of these items):

◆

Using proactive hot sparing, which shortens the rebuild period substantially by automatically

starting the hot spare before the drive fails. h e failure of a disk is generally preceded with read

errors (which are recoverable; they are detected and corrected using on-disk parity informa-

tion) or write errors, both of which are noncatastrophic. When a threshold of these errors

occurs before the disk itself fails, the failing drive is replaced by a hot spare by the array. h is

is much faster than the rebuild after the failure, because the bulk of the failing drive can be

used for the copy and because only the portions of the drive that are failing need to use parity

information from other disks.

◆

Using smaller RAID 5 sets (for faster rebuild) and striping the data across them using a higher-

level construct.

◆

Using a second parity calculation and storing this on another disk.

◆

As described in the sidebar “A Key RAID 5 Consideration,” one way to protect against data

loss in the event of a single drive failure in a RAID 5 set is to use another parity calculation. This

type of RAID is called RAID 6 (RAID-DP is a RAID 6 variant that uses two dedicated parity

drives, analogous to RAID 4). This is a good choice when large RAID groups and SATA

are used.

Figure 6.5 shows an example of a RAID 6 4+2 coni guration. The data is striped across four

disks, and a parity calculation is stored on the i fth disk. A second parity calculation is stored on

another disk. RAID 6 rotates the parity location with I/O, and RAID-DP uses a pair of dedicated

parity disks. This provides good performance and good availability but a loss in capacity efi -

ciency. The purpose of the second parity bit is to withstand a second drive failure during RAID

rebuild periods. It is important to use RAID 6 in place of RAID 5 if you meet the conditions

noted in the previous sidebar and are unable to otherwise use the mitigation methods noted.

Mastering VMware vSphere 5.5

Search WWH ::

Custom Search

Home