Information Technology Reference
In-Depth Information
on architectural parameters. The use of alternative counter-based time metrics is
explored by Kharbutli and Solihin for managing replacements [ 135 ]butcanbe
easily extended to manage leakage.
Need of the unfiltered reference stream to reliably detect generational behavior .Decay
works well only when the distribution of inter-access times is bimodal as is in the
L1. Things get muddier in the L2 where the generational behavior of cache lines is
obscured because of L1 filtering. In fact, what is observed in the L2 is the behavior
of L1 conflicts or the generational behavior of lines which are accessed on a much
different time scale than the L1 lines. Despite the initial assessment that decay
works well in the L2, albeit with very large decay intervals, Abella et al. exposed
the problems and proposed a new approach for L2 decay [ 1 ].
5.2.3 Adaptive Cache Decay and Adaptive Mode Control
Although cache decay is capable of shutting off a significant part of the cache with a small
performance impact, an aggressively small decay interval can cause a jump in the number
of decay-induced misses, destroying its advantage over cache resizing. On the other hand, a
conservatively large decay interval misses the opportunity to turn off cache lines already in their
dead time. Cache decay also carries a fixed overhead over an oracle prediction for dead lines.
This is because cache decay has to wait for the length of the decay interval from the last access
to a (dead) cache line to shut it off. This “missed opportunity” to save leakage increases with
larger decay intervals. In contrast, an oracle prediction knows immediately when a line enters
its dead time and wastes no time to start saving leakage.
It is clear that tuning the decay interval is critical in making decay work well for different
applications, or even for different phases of an application. Zhou et al. found, by trial and error,
that decay intervals vary significantly for SPEC2000 benchmarks. For instance, to keep the
performance penalty below 4%, decay intervals in a 64KB four-way set-associative cache, range
from 14 000 cycles for JPEG to 98 000 cycles for LI [ 250 ].
The selection of a decay interval for an application is thus a non-trivial task that must
balance dynamic power increase and performance loss to gains in leakage savings. Furthermore,
a single decay interval for an application derived from a profiling run is possibly not optimal for
every input data set or even for different phases of the application. To avoid passing this burden
to the user (programmers, compilers, operating systems), adaptive hardware mechanisms have
been proposed to adjust the decay interval dynamically. While there are a number of proposals in
the literature, here we describe three initial proposals: (1) local per-line decay interval adaptation
[ 127 ], (2) global adaptation of the decay interval based on application performance feedback
[ 250 ], and (3) a generalization of the global adaptation approach using control theory [ 219 ].
Search WWH ::




Custom Search