Information Technology Reference
In-Depth Information
may increase over time. Both of these factors may result significantly higher
failure rates than expected.
Example: Combined failure rate.
Question: For the system described in the previous examples (100 disks,
rotating parity with a group size of 10, mean time to failure of 10 6
hours, mean time to repair of 10 hours, and nonrecoverable read
error rate of one sector per 10 15 bits) assuming that all failures
are independent, estimate the MTTDL when both double-disk
and single-disk-and-sector failures are considered.
Answer:
MTTF ( MTTR(G 1)
N
FailureRate indep+const =
+ P fail recovery read )
MTTF
100 disks
10 6 hours(
10 hours
10 6 hours
=
+ 0:0694)
1
10 4 hours (
1
10 4 + 0:0694)
=
1
10 4 hours (0:0695)
=
6:95 10 6 failures
hour
=
=
Inverting the failure rate gives the mean time to data loss:
1
FailureRate indep+const
MTTDL const+indep =
1:44 10 5 hours
failure
=
16:4 years
failure
=
Two things in the example above are worth special note. First, for these
parameters, the dominant cause of data loss is likely to be a single disk failure
combined with a nonrecoverable read error during recovery. Second, for these
parameters and this configuration, the resulting 6% chance of losing data per
year may be unacceptable for many environments. As a result, systems use
various techniques to improve the MTTDL in RAID systems.
Improving RAID reliability
What can be done to further improve reliability? Broadly speaking, we can
do three things: (1) increase redundancy, (2) reduce nonrecoverable read error
 
Search WWH ::




Custom Search