Information Technology Reference
In-Depth Information
instructions in the basic block, but also breaks the dependence chains in it, returning results in
a single cycle. In addition to the energy saved by not executing instructions in functional units,
considerable energy can be also saved because all the bookkeeping activities in the processor
(instruction pointer update, instruction fetch, decode, rename, issue, etc.) during the execution
of a basic block are eliminated. Of course, it is much more expensive to access and match entries
in the BHB since each entry consists of arrays of values and valid bits [ 107 ].
Trace level : similar to the basic block reuse is the trace-level reuse proposed by Gonzalez,
Tubella, and Molina [ 86 ]. Traces are groups of consecutive instructions reflecting not their
position in the static code layout but their order in dynamic execution . A trace may span more
than one basic block by allowing executed branches (taken or non-taken) in the middle of the
trace. Similarly to basic blocks, a trace too can start with the same inputs, read the same values
from memory and produce the same results and side-effects (e.g., memory writes). Trace-level
reuse has analogous problems and benefits with basic block reuse, only amplified because the
traces can be longer.
4.10.2 Filter Cache
In 1997, Kin, Gupta, and Mangione-Smith proposed one of the first purely architectural
techniques to reduce power in cache hierarchies. Called the Filter Cache [ 142 ], the idea takes
the memory hierarchy characteristic of satisfying accesses in smaller structures to the extreme.
The filter cache is a tiny cache (128-256 Bytes) that filters the processor's reference stream in
a very power-efficient manner, trading performance for power to yield a better EDP product.
The filter cache is inserted between the processor and the L1 which now has a longer latency
being farther away from the processor. The original high-performance/higher-consumption
configuration with the L1 immediately next to the processor can be restored by simply bypassing
the filter cache.
The filter cache satisfies at full speed a significant percentage of the processor's references
(about 60% reported in [ 142 ]) very economically; but the remaining references that slip to the
L1 are slower. The reduced performance due to these slower L1 accesses unavoidably increases
program run time. Obviously, the energy benefit of the filter cache must not be outweighed by
the extra energy it takes for the longer-running programs, if the overall Energy
Delay of the
processor is to be improved. A successful filter cache must strike a delicate balance between its
performance (i.e., its hit rate) and its power. A very small filter cache, such as a line buffer—a
degenerate case—although quite power efficient, slows down the majority of the accesses that
miss in it. This is likely to hurt EDP. On the other hand, immoderately increasing the filter
cache's size, or employing full-associativity to increase its hit rate, will seriously diminish its
power benefits. A large size increases C , while full associativity increases A since multiple tags
must be compared simultaneously.
×
Search WWH ::




Custom Search