Information Technology Reference
In-Depth Information
but others are preceeded by increasing rates of non-fatal anomolies. Many
storage devices implement the SMART (Self-Monitoring, Analysis, and
Reporting Technology) interface, which provides a way for the operating
Definition: SMART
(Self-Monitoring,
Analysis, and Reporting
Technology)
system to monitor events that may be useful in predicting failures such
as read errors, sector remappings, inaccurate seek attempts, or failures to
spin up to the target speed.
Assuming devices behave identically. Different device models or even
different generations of the same model may have significantly different
failure behaviors. One generation might exhibit significantly higher failure
rates than expected and the next might exhibit significantly lower rates.
Example: Disk failures in large systems.
Question: Suppose you have a departmental file server with 100 disks,
each with an estimated MTTF of 1:5 10 6 hours. Estimate the
expected time until one of those disks fails. For simplicity, as-
sume that each disk has a constant failure rate and that disks fail
independently.
Answer: If each disk has a MTTF of 1:5 10 6 hours, then 100 disks fail at
a 100 times greater rate, giving us a MTTF of 1:510 4 hours. So,
although the annual failure rate of a single disk is 1 failure
1:510 6 hours
day 365 days
24 hours
= 0:00585 failures
, the annual failure rate of
year
year
the 100-disk system is 0.585 = 58.5%.
Example: Pitfalls.
Question: Given the pitfalls discussed above, is this calculation above likely
to overestimate or underestimate the failure rate of the system?
Answer: Of the factors listed above, the pitfall of relying on advertised
failure rates seems most significant, and it could lead us to sig-
nificantly underestimate the failure rate of the system.
This solution does assume constant failure rates. If the disks are
very new or very old, they may suffer higher failure rates than
expected, which might cause us to underestimate the failure rate
of the system.
Because we are only interested in the average rate, the correla-
tion pitfall is not particularly relevant to our analysis.
 
Search WWH ::




Custom Search