Optimizing Capacitance and Switching Activity to Reduce Dynamic Power - Computer Architecture Techniques for Power-Efficiency

Information Technology Reference

In-Depth Information

branch frequency) are gathered in an interval on the order of 100 000 instructions. The statistics

produce two pieces of information: first, the CPI (cycles per instruction) in the current window;

second, an indication on whether a phase change occurred. A phase change is detected if the

statistics in the current window are markedly different from the ones in the previous window.

In such a case, any previously selected configuration is discarded and a configuration search

starts anew. The sensitivity of the phase detection mechanism is adjusted dynamically so as to

not get stuck in a single configuration nor constantly initiate new configuration searches for no

good reason.

The search goes through the possible configurations, using each one for a whole time

window. The search starts with the 256KB 1-way L1 and progresses through the configurations

in Table 4.9, in order. The configuration search also stops if the miss rate drops below some

threshold (set to 1% in the paper). Each configuration that is tried out in a search yields a CPI,

which is stored in a table. When the search completes (either by running out of configurations

or by bringing the miss rate below the threshold) the configuration with the lowest CPI is

picked. This configuration is called “stable” and persists for the duration of the program phase.

Balasubramonian et al. report on the performance and power consumption of their

proposal using a subset of the SPEC95, SPEC2000, and Olden benchmarks [ 21 ]. A dynamic

L1/L2 division yields no results on programs that have very small miss rates in the L1. But

for programs exhibiting a significant miss rate with a conventional 64KB 2-way L1, a dynamic

L1/L2 division can improve the CPI by 15% on average (and for some programs up to 50%).

This performance improvement, however, comes at a cost: a significantly higher (over 2x) energy

per instruction (EPI) for some programs. The reason is that the L1, in the best performing

configurations, is highly associative. A low-power modification to the search—selecting the

lowest associativity for a specific size—improves the situation by trading some performance

improvement for a significant reduction in energy. Projecting to 35 nm technologies and a

3-level cache hierarchy, Balasubramonian et al. show a 43% energy reduction compared to a

standard cache.

4.8.2 Selective Cache Ways

One of the key notions in Albonesi's initial complexity-adaptive proposal is that caches can

be resized by changing their associativity [ 7 ]. In parallel with the variable L1/L2 division

proposals [ 7 , 21 ], Albonesi proposes a much simpler technique, specifically for reducing power

consumption. This technique, called “ selective cache ways ” abandons the variable L1/L2 division

and concentrates on resizing a single cache by changing its associativity [ 8 ].

The idea of selective cache ways is rooted on two observations. First, not all the cache

is needed all the time by all programs. In many situations, a smaller cache does (almost) as

well, consuming far less power. Second, and equally important, resizing the cache can be done

Search WWH ::

Custom Search

Home