Information Technology Reference
In-Depth Information
What causes whole-device failures?
Disk failures can be caused by a range of faults such as a disk head being damaged,
a capacitor failure or power surge that damages the electronics, or mechanical wear-out
that makes it difficult for the head to stay centered over a track.
Common causes of flash device failures include wear-out, when enough individual
pages fail that the device runs out of spare pages to use for remapping, and failures of
the device's electronics such as having a capacitor fail.
Note that the impact of a small number of lost sectors may be modest (e.g.,
the backup software succeeds in restoring all but a file or two) or it may be
severe (e.g., no data is restored.) For example, if the sector failure corrupts the
root directory, a significant fraction of the data may be lost.
Device failures
Full disk or flash drive failures are when a device stops being able to service
Definition: disk device
failure, flash device
failure
reads or writes to all sectors.
When a whole device fails, the host computer's device driver will detect
the failure, and reads and writes to the device will return error codes rather
than, for example, returning incorrect data. This explicit failure notification is
important because it reduces the amount of cross-device redundancy needed to
correct failures.
Full device failure rates are typically characterized by an annual failure rate,
Definition: annual failure
rate
the fraction of disks expected to fail each year, or by a mean time before failure
(MTTF) which is the inverse of the specified constant annual failure rate. In
Definition: mean time
before failure (MTTF)
2011, specified annual failure rates (or MTTFs) for spinning disks typically
range from 0.5% (1.7 *10 6
hours) to 0.9% (1 * 10 6
hours); specified failure rates
for flash solid state drives are similar.
Systems with many storage devices expect to encounter frequent failures.
For example,
Pitfalls. Storage system designers must consider several pitfalls when consid-
ering advertised device failure rates.
Relying on advertised failure rates. Studies across several large col-
lections of spinning disks have found significantly variability in failure
rates. In these studies, many systems experienced failure rates of 2%, 4%,
or higher despite advertised failure rates of under 1%.
Search WWH ::




Custom Search