Information Technology Reference
In-Depth Information
Mitigation: Error correcting codes.
Error correcting codes deal with fail-
Definition: error
correcting codes
ures when some of the bits in a sector or page are corrupted. When the device
stores data, it encodes the data with additional redundancy. Then, if a small
number of bits are corrupted in a sector or page being read, the hardware auto-
matically corrects the error, and the read successfully completes. If the damage
is more extensive, then with high likelihood the read fails and returns an error
code; being told that the device has lost data is not a perfect solution, but it is
better than having the device silently return the wrong data.
Manufacuturers balance storage space overheads against error correction ca-
pabilities to achieve acceptable advertised sector or page failure rate, typically
expressed as the expected number of bits that can be read before encountering
an unreadable sector or page. In 2011, advertised disk and flash nonrecoverable
read error rates typically range between one sector or page per 10 14
to 10 16
bits
Definition: nonrecoverable
read error
read. The nonrecoverable read error rate is sometimes called the bit error rate.
Definition: bit error rate
Mitigation: Remapping Disks and flash are manufactured with some num-
ber of spare sectors or pages so that they can continue to function despite some
number of permanent sector or page failures by remapping failed sectors or pages
to good ones. Before shipping hardware to users, manufacturers scan devices
to remap bad sectors or pages caused by manufacturing defects. Later, if addi-
tional permanent failures are detected, the operating system or device firmware
can remap the failed sectors or pages to good ones.
Pitfalls. Although devices' nonrecoverable read rate specications are helpful,
designers must avoid a number of common pitfalls:
Assuming that nonrecoverable read error rates are negligible.
Storage devices' advertised error rates sound impressive, but with the
large capacities of today's storage, these error rates are non-negligible.
For example, if you completely read a 2 TB disk with a bit error rate of 1
sector per 10 14 bits, there may be more than a 10% chance of encountering
at least one error.
Assuming nonrecoverable read error rates are constant. Although
a device may specify a single number as its unrecoverable read error rate,
many factors can affect the rate at which such errors manifest. A given
device's actual bit error rate may depend on its load (e.g., some faults
may be caused by device activity), its age (e.g., some faults may become
more likely as a device ages), or even its specific workload (e.g., faults in
some sectors or pages may be caused by reads or writes to nearby sectors
or pages.)
Assuming independent failures. Errors may be correlated in time or
space: finding an error in one sector may make it more likely that you
Search WWH ::




Custom Search