Hardware Reference
In-Depth Information
in 500,000 hours of total running time. If 1,000,000 drives of this model are in service
and all 1,000,000 are running simultaneously, you can expect one failure out of this entire
population every half-hour. MTBF statistics are not useful for predicting the failure of an
individual drive or a small sample of drives.
You also need to understand the meaning of the word failure . In this sense, a failure is a
faultthatrequiresthedrivetobereturnedtothemanufacturerforrepair,notanoccasional
failure to read or write a file correctly.
Finally, as some drive manufacturers point out, this measure of MTBF should really be
called mean time to first failure .“Betweenfailures”impliesthatthedrivefails,isreturned
for repair, and then at some point fails again. The interval between repair and the second
failure herewouldbetheMTBF.Inmostcases, afailed harddrivethat wouldneedmanu-
facturer repair is replaced rather than repaired, so the whole MTBF concept is misnamed.
ThebottomlineisthatIdonotreallyplacemuchemphasisonMTBFfigures.Foranindi-
vidual drive, they are not accurate predictors of reliability. However, if you are an inform-
ation systems manager considering the purchase of thousands of PCs or drives per year
or a system vendor building and supporting thousands of systems, it might be worth your
while to examine these numbers and study the methods used to calculate them by each
vendor. Most hard drive manufacturers designate their premium drives as Enterprise class
drives, meaning they are designed for use in environments requiring full-time usage and
high reliability and carry the highest MTBF ratings. If you can understand the vendor's
calculations and compare the actual reliability of a large sample of drives, you can pur-
chase more reliable drives and save time and money in service and support.
S.M.A.R.T.
Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T.) is an industry stand-
ard providing failure prediction for disk drives. When S.M.A.R.T. is enabled for a given
drive, the drive monitors predetermined attributes that are susceptible to or indicative of
drive degradation. Based on changes in the monitored attributes, a failure prediction can
be made. Ifa failure is deemed likely to occur,S.M.A.R.T.makes a status report available
sothesystemBIOSordriversoftwarecannotifytheuseroftheimpendingproblems,per-
haps enabling the user to back up the data on the drive before any real problems occur.
Predictable failures are the types of failures S.M.A.R.T. attempts to detect. These failures
result from the gradual degradation of the drive's performance. According to Seagate,
60% of drive failures are mechanical, which is exactly the type of failures S.M.A.R.T. is
designed to predict.
Of course, not all failures are predictable, and S.M.A.R.T. can't help with unpredictable
failures that occur without advance warning. These can be caused by static electricity, im-
Search WWH ::




Custom Search