Optimizing Capacitance and Switching Activity to Reduce Dynamic Power - Computer Architecture Techniques for Power-Efficiency

Information Technology Reference

In-Depth Information

be cached. Many techniques in hardware and even in software—where the compiler discovers

the repetition [ 60 , 61 ]—have been proposed to exploit this property.

A related concept to work reuse is value prediction [ 156 ]. Value prediction guesses

the outcome of a computation but does not guarantee the correctness of the result. As such,

although great for breaking dependence chains by guessing ahead, it requires verification. Full

re-execution of the value-predicted computation does not save any switching activity, in fact

value prediction adds to the existing switching activity by accessing the prediction structures.

For this reason we do not expand further on value prediction.

Cache hierarchy : The cache hierarchy itself, besides a performance optimization, is also

a power optimization, in the sense that it steers the majority of accesses to small and power-

efficient (lower capacitance) memory structures. To put it another way, the memory hierarchy

is a natural way to minimize switching activity in successively larger and more power-hungry

caches. A typical cache hierarchy composed of small L1s (instructions and data), and successively

larger caches (L2, L3), is intentionally designed so that most accesses are satisfied as close to

the processor as possible. The reason why the highest levels of the hierarchy end up with

comparably the largest chunk of the power budget is exactly because of this behavior: being

more efficient per access , they take on the burden of satisfying the most accesses. 11 Here, three

low-power approaches, exploiting this characteristic of the cache hierarchy are presented: the

filter cache ,the loop cache ,andthe trace cache . The last one, the trace cache, combines work reuse

(caching the work of instruction decoders) with caching of the instruction L1.

☞

dynamic power in caches : Dynamic power consumption in caches (but also in other

memory structures, e.g., SRAMs, registers, CAMs) depends primarily on two factors: the

size of the memory structure ( C ) and its access activity ( A ). Size matters, since accessing

a physically larger memory requires more power even when the number of accessed bits

per access remains constant. This is simply a consequence of the larger decoders and the

longer (higher-capacitance) bit/word-lines of larger memories. At the same time, speed

is also affected by memory size as a consequence of wire delay being proportional to the

square of the wire length. Smaller memory is both faster and more power-efficient. Thus,

it is not surprising that caches optimized for speed are also fairly well sized for power.

Size optimization, affecting the total capacitance C , in caches is usually done statically via

sub-banking, bit-line segmentation (see “Sidebar: Bit-line Segmentation”), etc. CACTI,

a popular tool that analytically calculates latency, power, and area for cache organizations,

automatically performs such optimizations giving priority to speed optimizations [ 204 ].

11 On the other hand going to main memory incurs a significant power penalty at the chip interface because of the

chip's I/O drivers and external buses. Fortunately, because of caching, few accesses manage to reach main memory.

Search WWH ::

Custom Search

Home