Information Technology Reference
In-Depth Information
multiplies, and 50% of FP divisions are memoizable and can be “performed” in a single cycle
with small (32-entry, 4-way set-associative) Memo-tables [ 56 ].
Instruction level : Seminal work on dynamic instruction reuse was done by Sodani & Sohi
[ 208 ]. The observation in their work is that many dynamically-executed instructions (or groups
of instructions) operate on the same inputs . Sodani and Sohi were led to the discovery of this
property by examining how execution proceeds in dynamically scheduled superscalar processors.
In particular, they noticed that execution in a mispredicted path converges with execution in
the correct path resulting in some of the instructions beyond the point of convergence being
executed twice, verbatim , in the case of a misprediction. Furthermore, the iterative nature of
programs in conjunction with the way code is written modularly to operate on different inputs
results in significant repetition of the same inputs for the same instructions.
Similarly to the operation memoization, the results of such instructions can be saved and
simply reused when needed rather than re-executing the computation. Sodani and Sohi claim
that in some cases over 50% of the instructions can be reused in this way. Although their work
is also focused on performance, the implications of instruction reuse on power consumption
can be quite important with such a large reuse rate.
Sodani and Sohi propose three schemes to implement instruction reuse. The first two are
simply caches of inputs and results called Reuse Buffers ( RB ). One bases its reuse test on input
values . Upon seeing the same input values for an instruction the result is used. The second
simplifies the reuse test and reduces the required storage space per RB entry by relying not
on input values but on input register names . Reuse of an instruction depends on whether it
operates on the same registers as before. RB entries in this case are invalidated when registers
are written. In both schemes, the reuse of a load is predicated upon the corresponding memory
location not having been written . RB entries corresponding to loads are thus invalidated when
their address is written. Finally, the third scheme takes into account not only register names but
also dependence chains to track the reuse status of such instruction chains. It carries, however,
considerable complexity, hence increased power consumption.
Basic block level : Huang and Lilja take reuse one step further and discuss basic block reuse
[ 107 ]. Their observations concern whole basic blocks for which they found that their inputs
and outputs can be quite regular and predictable. Their studies show that for the SPEC95
benchmarks, a vast majority of basic blocks (90%) have few input and output registers (up to
four and five, respectively) and only read and write few memory locations (up to four and two,
respectively).
Similarly to the RB buffer, a block history buffer ( BHB ) stores inputs and outputs of
basic blocks and provides reuse at the basic block level. The increased number of inputs that
must match for the result to be determinable means that basic block reuse is not as prevalent as
instruction reuse. However, when reuse succeeds, not only avoids the execution of the individual
Search WWH ::




Custom Search