Information Technology Reference
In-Depth Information
DATA
DATA
DATA
DATA
Step 1
Step 2
Compare
MUX
Step 3
FIGURE 4.23: A power-challenged set-associative cache.
divides up large arrays into sub-arrays). Regardless, the important information conveyed in this
figure are the shaded areas of a set-associative cache where switching occurs during an access.
In a power-agnostic design, all the cache is shaded: all tag ways and data ways are accessed
simultaneously. All the tags of the selected set are matched against the requested address to
determine a hit or a miss. Simultaneously, all the data arrays are accessed to provide the data
by the time a possible hit is determined.
Clearly we can do better. There is plenty of “excess” switching activity during an access
but optimizing it away may cost in performance. The techniques presented here (listed in
Table 4.9) aim to significantly reduce power while preserving as much performance as possible.
4.9.1 Phased Cache
A straightforward technique to reduce the full switching activity of a set-associative cache is
to defer accessing the data ways until a hit is determined and, then, accessing only the correct
data way for the requested data. In other words, as the name suggests, access the cache in
phases: first the tags and then (if needed) the data. This technique, appearing as one of the
earliest techniques for dynamic power reduction in caches, is discussed in Hasegawa et al. [ 95 ]
as the implementation of the SH3 cache (Hitachi's low-power embedded RISC processor).
Subsequently, it appears in the L2 cache of the Alpha 21264 [ 87 ].
The benefit of phasing is a significant reduction in power for the data access which is
linear to the miss ratio (no data ways are accessed on misses) and inversely proportional to
associativity:
P data new
=
P data old
×
(1
miss ratio)
/
Ways
.
The cost in performance is due to the larger latency: the data access no longer can be
hidden partially behind the tag access and tag comparison. The performance cost is significant
if performance is strongly dependent on latency: for example in non-pipelined L1 caches or
Search WWH ::




Custom Search