Information Technology Reference
In-Depth Information
pipeline to accomplish this. This is a work-steering strategy that is also revisited in Chapter 4
for effective capacitance optimizations (Section 4.13). 4
Fields, Bodik, and Hill use this work-steering approach as an example of how instruction
slack can be exploited [ 76 ]. In their study, they show that there exists a significant slack in
instructions. In many instances, instructions can be delayed several cycles without any impact
on the program's critical path and hence its performance [ 76 ]. Furthermore, they classify the
instruction slack into local , global ,and apportioned . Local slack exists when an instruction can
be delayed without any impact on any other instruction. Global slack exists when delaying an
instruction does not delay the last instruction of the program (i.e., there is no impact on the
total execution time). Apportioned slack refers to the amount of slack for a group of instructions
that can be delayed together without impact on execution. Apportioned slack depends on how
it is calculated from the instructions' individual slack [ 76 ].
To measure slack, Fields et al. use an offline analysis (similar to the Semeraro et al.
offline approach used in [ 200 ] and described in Section 3.4.1) that creates a dependence
graph of the execution taking into account both data dependencies and microarchitectural
resource constraints. Their offline approach allows the calculation of all three types of slack.
Their results show that there is enormous potential for exploiting slack by slowing down
instructions [ 76 ].
More interestingly, Fields et al. show that one can dynamically predict slack in hardware.
This is of significance since it allows for the possibility of fine-grain —on a per-instruction
basis—control policies. Online control policies discussed previously for DVFS in MCD pro-
cessors cannot treat each instruction individually. There is simply no possibility of dynamically
changing the frequency of execution individually for each instruction; instead, the frequency of
each domain is adjusted according to the aggregate behavior of all the instructions processed in
this domain over the course of a sampling interval (Section 3.4.1).
With work steering, the execution frequencies are fixed for each execution pipeline—as
is the case for the fast and slow pipelines in Figure 3.7-and the instructions are steered toward
the appropriate pipeline. All that is needed to implement work steering is to have a good idea
of the slack of each instruction. And this is where prediction comes into play. According to
Fields et al., for 68% of the static instructions, 90% of their dynamic instances have enough
slack to double their latency. This slack “locality” allows slack prediction to be based on sparsely
sampling dynamic instructions and determining their slack.
Slack prediction would not be feasible if slack could not be measured efficiently at
run-time. To determine whether an instruction has slack, Fields et al. employ an elegant delay-
4 There too, multiple components are provided, offering a range of power/performance characteristics, and work
(computation) is dynamically steered according to run-time conditions and goals.
Search WWH ::




Custom Search