Database Reference
In-Depth Information
of units for a specific number of hours and checking the number of disks that perma-
nently failed. Instead of using MTTF for describing disk reliability, some hard drive
manufacturers now use AFR [34] . AFR is the estimated probability that the disk will
fail during a full year of use. Essentially, AFR can be seen as another form of MTTF
expressed in years, which can be obtained according to equation (2.1) [35] :
AFR
=− −
1exp( 8760/
MTTF
)
(2.1)
where 8760 is to convert the time unit from hour to year (1 year = 8760 hours). The
advantage of using AFR as the disk reliability matric is that it is more intuitive and
easier to understand by non-computer specialists. For example, for a disk with MTTF
of 300,000, the AFR is 2.88% per year, that is, a probability of 2.88% that the disk is
expected to fail during one year of use.
However, in practice, the AFR value is sometimes not consistent with the MTTF
value specified in the datasheets of the disks [3,18] . Because of a variety of factors,
such as working temperature, work load, and so forth, actual disk drive reliability may
differ from the manufacturer's specification and vary from user to user [18] . MTTF
and AFR values of disks were comprehensively investigated according to records and
logs collected from several large production systems for every disk that was replaced
in the system [3] . According to the results of these collected records and logs, the
AFR of disks typically exceeds 1%, with 2-4% as a norm, and sometimes more than
10% can be observed. The datasheet MTTF of those disks, however, only ranges from
1,000,000 to 1,500,000 hours (i.e., an AFR of at most 0.88%). Disk reliability analysis
based on Google's more than 100,000 ATA disks also observed an average AFR value
higher than 1%, which is from 1.7% for disks that were in their first year of operation
to as high as 8.6% for older disks of 3 years old [17] .
2.1.1.3 Disk reliability patterns
The failure pattern of disks is always a key aspect in the field of disk reliability study.
In some early research studies on this issue, the failure pattern of disks is assumed to
follow exponential distribution [2,36] due to the continuous and independent occur-
rence of disk failures. For example, an early study [2] stated that the life span of disks
can be characterized by exponential distribution. In addition, in order to simplify the
calculation, some more recent studies that analyze data reliability also assumed an
exponential disk/data reliability model [5,19] .
In the exponential disk reliability models, the failure rate of each disk is a constant.
However, these reliability models with a constant disk failure rate cannot explain some
of the phenomena happening in reality. It has been quite well known that the failure
rate of disk drives follows what is often called a “bathtub” curve, where disk failure
rate is higher in the disk's early life, drops during the first year, remains relatively
constant for the remainder of the disk's useful life span, and rises again at the end of
the disk's lifetime [37] . This disk failure model underlies many of the more recent
models and simplifications, such as where the disk failure model incorporates the
bathtub curve to observe the infant mortality phenomenon in large storage systems
Search WWH ::




Custom Search