Information Technology Reference
In-Depth Information
be cached. Many techniques in hardware and even in software—where the compiler discovers
the repetition [ 60 , 61 ]—have been proposed to exploit this property.
A related concept to work reuse is value prediction [ 156 ]. Value prediction guesses
the outcome of a computation but does not guarantee the correctness of the result. As such,
although great for breaking dependence chains by guessing ahead, it requires verification. Full
re-execution of the value-predicted computation does not save any switching activity, in fact
value prediction adds to the existing switching activity by accessing the prediction structures.
For this reason we do not expand further on value prediction.
Cache hierarchy : The cache hierarchy itself, besides a performance optimization, is also
a power optimization, in the sense that it steers the majority of accesses to small and power-
efficient (lower capacitance) memory structures. To put it another way, the memory hierarchy
is a natural way to minimize switching activity in successively larger and more power-hungry
caches. A typical cache hierarchy composed of small L1s (instructions and data), and successively
larger caches (L2, L3), is intentionally designed so that most accesses are satisfied as close to
the processor as possible. The reason why the highest levels of the hierarchy end up with
comparably the largest chunk of the power budget is exactly because of this behavior: being
more efficient per access , they take on the burden of satisfying the most accesses. 11 Here, three
low-power approaches, exploiting this characteristic of the cache hierarchy are presented: the
filter cache ,the loop cache ,andthe trace cache . The last one, the trace cache, combines work reuse
(caching the work of instruction decoders) with caching of the instruction L1.
dynamic power in caches : Dynamic power consumption in caches (but also in other
memory structures, e.g., SRAMs, registers, CAMs) depends primarily on two factors: the
size of the memory structure ( C ) and its access activity ( A ). Size matters, since accessing
a physically larger memory requires more power even when the number of accessed bits
per access remains constant. This is simply a consequence of the larger decoders and the
longer (higher-capacitance) bit/word-lines of larger memories. At the same time, speed
is also affected by memory size as a consequence of wire delay being proportional to the
square of the wire length. Smaller memory is both faster and more power-efficient. Thus,
it is not surprising that caches optimized for speed are also fairly well sized for power.
Size optimization, affecting the total capacitance C , in caches is usually done statically via
sub-banking, bit-line segmentation (see “Sidebar: Bit-line Segmentation”), etc. CACTI,
a popular tool that analytically calculates latency, power, and area for cache organizations,
automatically performs such optimizations giving priority to speed optimizations [ 204 ].
11 On the other hand going to main memory incurs a significant power penalty at the chip interface because of the
chip's I/O drivers and external buses. Fortunately, because of caching, few accesses manage to reach main memory.
Search WWH ::




Custom Search