Information Technology Reference
In-Depth Information
- Flexicache allows cache operating down to 320 mV (10% failure rate) by
presenting, on average, 63% energy reduction in cache operations. The area
overhead of Flexicache is only 12% compared to a typical L1 cache.
2 Background and Related Work
In this section, we first explain the nomenclatures of failures in memory struc-
tures. Then we present the previous schemes used for scaling V dd .
Memory Failures: Bit failures are classified into two broad categories [12]:
Persistent Failures: The random variation in the number and location of dopant
atoms in the channel region of the device leads to the random variations in
transistor threshold voltage. It causes threshold voltage mismatch between the
transistors close to each other. In a SRAM cell, a mismatch in the strength be-
tween the neighbouring transistors caused by intra-die variations can result in
the failure of the cell [4]. A cell failure can occur due to: (1) An increase in the cell
access time, (2) unstable read operation, (3) unstable write operation, (4) failure
in the data holding capability of the cell. Further details can be found in [30].
On the other side, open or short circuits cause irreversible physical changes in
the semiconductor devices. These permanent failures tend to occur early in the
processor lifetime due to manufacturing faults (called the infant mortality), or
late in the lifetime due to thermal and process related stress. The location of a
persistent failure is random and independent of whether the neighbouring bit is
faulty or not [20]. The locations of persistently defective bits can be detected by
performing built-in self test (BIST) [17].
Non-Persistent Failures: Radiation events or power supply noise can cause a
bit flip and corrupt a data stored in a device until a new data is written [8].
As transistor dimensions and operating voltages shrink, sensitivity to radiation
events increases drastically. On the other side, process variation or in-progress
wear-out, combined with voltage and temperature fluctuations might cause cor-
related faults of short duration. They are termed intermittent faults (or erratic
failures), that last from several cycles to several seconds [13]. Diagnosing an in-
termittent fault by BIST is hard since it does not persist and conditions that
cause the fault are hard to regenerate. As V dd decreases, the bit failure rate
increases rapidly for both intermittent faults and persistent failures [23,12].
Related Work: In this section, we discuss architecture-based schemes uti-
lized under scaling voltage and compare their main characteristics with Flexi-
cache in Table 1. Orthogonal Latin Square Code (OLSC) [18] is a state of the
art ECC scheme used for level-1 caches when the supply voltage is lower than
the safe margin. Multi-Bit Segmented ECC (MS-ECC) [12] utilizes OLSC at a
finer granularity in order to increase the error correction capability of OLSC to
be used for ultra-low voltage level. Thus MS-ECC can reduce the supply voltage
until 350 mV in 35nm technology by providing 6.5% useful cache capacity (We
define useful cache capacity as the portion of the cache which is not disabled) [23].
Kim, et al. [19], propose two-dimensional (2D) ECC to correct multi-bit errors
with a minimum area overhead in check bits. However, the correction capability
 
Search WWH ::




Custom Search