Managing Static (Leakage) Power - Computer Architecture Techniques for Power-Efficiency

Information Technology Reference

In-Depth Information

on architectural parameters. The use of alternative counter-based time metrics is

explored by Kharbutli and Solihin for managing replacements [ 135 ]butcanbe

easily extended to manage leakage.

Need of the unfiltered reference stream to reliably detect generational behavior .Decay

works well only when the distribution of inter-access times is bimodal as is in the

L1. Things get muddier in the L2 where the generational behavior of cache lines is

obscured because of L1 filtering. In fact, what is observed in the L2 is the behavior

of L1 conflicts or the generational behavior of lines which are accessed on a much

different time scale than the L1 lines. Despite the initial assessment that decay

works well in the L2, albeit with very large decay intervals, Abella et al. exposed

the problems and proposed a new approach for L2 decay [ 1 ].

5.2.3 Adaptive Cache Decay and Adaptive Mode Control

Although cache decay is capable of shutting off a significant part of the cache with a small

performance impact, an aggressively small decay interval can cause a jump in the number

of decay-induced misses, destroying its advantage over cache resizing. On the other hand, a

conservatively large decay interval misses the opportunity to turn off cache lines already in their

dead time. Cache decay also carries a fixed overhead over an oracle prediction for dead lines.

This is because cache decay has to wait for the length of the decay interval from the last access

to a (dead) cache line to shut it off. This “missed opportunity” to save leakage increases with

larger decay intervals. In contrast, an oracle prediction knows immediately when a line enters

its dead time and wastes no time to start saving leakage.

It is clear that tuning the decay interval is critical in making decay work well for different

applications, or even for different phases of an application. Zhou et al. found, by trial and error,

that decay intervals vary significantly for SPEC2000 benchmarks. For instance, to keep the

performance penalty below 4%, decay intervals in a 64KB four-way set-associative cache, range

from 14 000 cycles for JPEG to 98 000 cycles for LI [ 250 ].

The selection of a decay interval for an application is thus a non-trivial task that must

balance dynamic power increase and performance loss to gains in leakage savings. Furthermore,

a single decay interval for an application derived from a profiling run is possibly not optimal for

every input data set or even for different phases of the application. To avoid passing this burden

to the user (programmers, compilers, operating systems), adaptive hardware mechanisms have

been proposed to adjust the decay interval dynamically. While there are a number of proposals in

the literature, here we describe three initial proposals: (1) local per-line decay interval adaptation

[ 127 ], (2) global adaptation of the decay interval based on application performance feedback

[ 250 ], and (3) a generalization of the global adaptation approach using control theory [ 219 ].

Search WWH ::

Custom Search

Home