Information Technology Reference
In-Depth Information
branch frequency) are gathered in an interval on the order of 100 000 instructions. The statistics
produce two pieces of information: first, the CPI (cycles per instruction) in the current window;
second, an indication on whether a phase change occurred. A phase change is detected if the
statistics in the current window are markedly different from the ones in the previous window.
In such a case, any previously selected configuration is discarded and a configuration search
starts anew. The sensitivity of the phase detection mechanism is adjusted dynamically so as to
not get stuck in a single configuration nor constantly initiate new configuration searches for no
good reason.
The search goes through the possible configurations, using each one for a whole time
window. The search starts with the 256KB 1-way L1 and progresses through the configurations
in Table 4.9, in order. The configuration search also stops if the miss rate drops below some
threshold (set to 1% in the paper). Each configuration that is tried out in a search yields a CPI,
which is stored in a table. When the search completes (either by running out of configurations
or by bringing the miss rate below the threshold) the configuration with the lowest CPI is
picked. This configuration is called “stable” and persists for the duration of the program phase.
Balasubramonian et al. report on the performance and power consumption of their
proposal using a subset of the SPEC95, SPEC2000, and Olden benchmarks [ 21 ]. A dynamic
L1/L2 division yields no results on programs that have very small miss rates in the L1. But
for programs exhibiting a significant miss rate with a conventional 64KB 2-way L1, a dynamic
L1/L2 division can improve the CPI by 15% on average (and for some programs up to 50%).
This performance improvement, however, comes at a cost: a significantly higher (over 2x) energy
per instruction (EPI) for some programs. The reason is that the L1, in the best performing
configurations, is highly associative. A low-power modification to the search—selecting the
lowest associativity for a specific size—improves the situation by trading some performance
improvement for a significant reduction in energy. Projecting to 35 nm technologies and a
3-level cache hierarchy, Balasubramonian et al. show a 43% energy reduction compared to a
standard cache.
4.8.2 Selective Cache Ways
One of the key notions in Albonesi's initial complexity-adaptive proposal is that caches can
be resized by changing their associativity [ 7 ]. In parallel with the variable L1/L2 division
proposals [ 7 , 21 ], Albonesi proposes a much simpler technique, specifically for reducing power
consumption. This technique, called “ selective cache ways ” abandons the variable L1/L2 division
and concentrates on resizing a single cache by changing its associativity [ 8 ].
The idea of selective cache ways is rooted on two observations. First, not all the cache
is needed all the time by all programs. In many situations, a smaller cache does (almost) as
well, consuming far less power. Second, and equally important, resizing the cache can be done
Search WWH ::




Custom Search