Hardware Reference
In-Depth Information
Aftersavingyourwork,determine thecauseoftheparityerrorandrepairthesystem.You
might be tempted to use an option to shut off further parity checking and simply continue
using the system as though nothing were wrong. But doing so is like unscrewing the oil
pressure warning indicator bulb on a car with an oil leak so the oil pressure light won't
bother you anymore!
ECC
ECC goes a big step beyond simple parity-error detection. Instead of just detecting an er-
ror, ECC allows a single bit error to be corrected, which means the system can continue
without interruption and without corrupting data. ECC, as implemented in most PCs, can
only detect, not correct, double-bit errors. Because studies have indicated that approxim-
ately 98% of memory errors are the single-bit variety, the most commonly used type of
ECCisoneinwhichtheattendantmemorycontrollerdetectsandcorrectssingle-biterrors
in an accessed data word. (Double-bit errors can be detected but not corrected.) This type
of ECC is known as single-bit error-correction double-bit error detection (SEC-DED)
and requires an additional 7 check bits over 32 bits in a 4-byte system and an addition-
al 8 check bits over 64 bits in an 8-byte system. If the system uses SIMMs, two 36-bit
(parity) SIMMs are added for each bank (for a total of 72 bits), and ECC is done at the
banklevel.IfthesystemusesDIMMs,asingleparity/ECC72-bitDIMMisusedasabank
andprovidestheadditionalbits.RIMMsareinstalledinsinglesorpairs,dependingonthe
chipset and motherboard. They must be 18-bit versions if parity/ECC is desired.
ECC entails the memory controller calculating the check bits on a memory-write opera-
tion, performing a compare between the read and calculated check bits on a read opera-
tion, and, if necessary, correcting bad bits. The additional ECC logic in the memory con-
trollerisnotverysignificant inthisageofinexpensive,high-performance VLSIlogic,but
ECC actually affects memory performance on writes. This is because the operation must
be timed to wait for the calculation of check bits and, when the system waits for correc-
ted data, reads. On a partial-word write, the entire word must first be read, the affected
byte(s) rewritten, and then new check bits calculated. This turns partial-word write oper-
ations into slower read-modify writes. Fortunately, this performance hit is small, on the
orderofafewpercentatmaximum,sothetrade-offforincreasedreliability isagoodone.
Most memory errors are of a single-bit nature, which ECC can correct. Incorporating
thisfault-tolerant technique provideshighsystem reliability andattendant availability.An
ECC-based system is a good choice for servers, workstations, or mission-critical applica-
tions in which the cost of a potential memory error outweighs the additional memory and
system cost to correct it, along with ensuring that it does not detract from system reliabil-
ity.
Search WWH ::




Custom Search