Optimizing Capacitance and Switching Activity to Reduce Dynamic Power - Computer Architecture Techniques for Power-Efficiency

Information Technology Reference

In-Depth Information

multiplies, and 50% of FP divisions are memoizable and can be “performed” in a single cycle

with small (32-entry, 4-way set-associative) Memo-tables [ 56 ].

Instruction level : Seminal work on dynamic instruction reuse was done by Sodani & Sohi

[ 208 ]. The observation in their work is that many dynamically-executed instructions (or groups

of instructions) operate on the same inputs . Sodani and Sohi were led to the discovery of this

property by examining how execution proceeds in dynamically scheduled superscalar processors.

In particular, they noticed that execution in a mispredicted path converges with execution in

the correct path resulting in some of the instructions beyond the point of convergence being

executed twice, verbatim , in the case of a misprediction. Furthermore, the iterative nature of

programs in conjunction with the way code is written modularly to operate on different inputs

results in significant repetition of the same inputs for the same instructions.

Similarly to the operation memoization, the results of such instructions can be saved and

simply reused when needed rather than re-executing the computation. Sodani and Sohi claim

that in some cases over 50% of the instructions can be reused in this way. Although their work

is also focused on performance, the implications of instruction reuse on power consumption

can be quite important with such a large reuse rate.

Sodani and Sohi propose three schemes to implement instruction reuse. The first two are

simply caches of inputs and results called Reuse Buffers ( RB ). One bases its reuse test on input

values . Upon seeing the same input values for an instruction the result is used. The second

simplifies the reuse test and reduces the required storage space per RB entry by relying not

on input values but on input register names . Reuse of an instruction depends on whether it

operates on the same registers as before. RB entries in this case are invalidated when registers

are written. In both schemes, the reuse of a load is predicated upon the corresponding memory

location not having been written . RB entries corresponding to loads are thus invalidated when

their address is written. Finally, the third scheme takes into account not only register names but

also dependence chains to track the reuse status of such instruction chains. It carries, however,

considerable complexity, hence increased power consumption.

Basic block level : Huang and Lilja take reuse one step further and discuss basic block reuse

[ 107 ]. Their observations concern whole basic blocks for which they found that their inputs

and outputs can be quite regular and predictable. Their studies show that for the SPEC95

benchmarks, a vast majority of basic blocks (90%) have few input and output registers (up to

four and five, respectively) and only read and write few memory locations (up to four and two,

respectively).

Similarly to the RB buffer, a block history buffer ( BHB ) stores inputs and outputs of

basic blocks and provides reuse at the basic block level. The increased number of inputs that

must match for the result to be determinable means that basic block reuse is not as prevalent as

instruction reuse. However, when reuse succeeds, not only avoids the execution of the individual

Search WWH ::

Custom Search

Home