Using Voltage and Frequency Adjustments toManage Dynamic Power - Computer Architecture Techniques for Power-Efficiency

Information Technology Reference

In-Depth Information

pipeline to accomplish this. This is a work-steering strategy that is also revisited in Chapter 4

for effective capacitance optimizations (Section 4.13). 4

Fields, Bodik, and Hill use this work-steering approach as an example of how instruction

slack can be exploited [ 76 ]. In their study, they show that there exists a significant slack in

instructions. In many instances, instructions can be delayed several cycles without any impact

on the program's critical path and hence its performance [ 76 ]. Furthermore, they classify the

instruction slack into local , global ,and apportioned . Local slack exists when an instruction can

be delayed without any impact on any other instruction. Global slack exists when delaying an

instruction does not delay the last instruction of the program (i.e., there is no impact on the

total execution time). Apportioned slack refers to the amount of slack for a group of instructions

that can be delayed together without impact on execution. Apportioned slack depends on how

it is calculated from the instructions' individual slack [ 76 ].

To measure slack, Fields et al. use an offline analysis (similar to the Semeraro et al.

offline approach used in [ 200 ] and described in Section 3.4.1) that creates a dependence

graph of the execution taking into account both data dependencies and microarchitectural

resource constraints. Their offline approach allows the calculation of all three types of slack.

Their results show that there is enormous potential for exploiting slack by slowing down

instructions [ 76 ].

More interestingly, Fields et al. show that one can dynamically predict slack in hardware.

This is of significance since it allows for the possibility of fine-grain —on a per-instruction

basis—control policies. Online control policies discussed previously for DVFS in MCD pro-

cessors cannot treat each instruction individually. There is simply no possibility of dynamically

changing the frequency of execution individually for each instruction; instead, the frequency of

each domain is adjusted according to the aggregate behavior of all the instructions processed in

this domain over the course of a sampling interval (Section 3.4.1).

With work steering, the execution frequencies are fixed for each execution pipeline—as

is the case for the fast and slow pipelines in Figure 3.7-and the instructions are steered toward

the appropriate pipeline. All that is needed to implement work steering is to have a good idea

of the slack of each instruction. And this is where prediction comes into play. According to

Fields et al., for 68% of the static instructions, 90% of their dynamic instances have enough

slack to double their latency. This slack “locality” allows slack prediction to be based on sparsely

sampling dynamic instructions and determining their slack.

Slack prediction would not be feasible if slack could not be measured efficiently at

run-time. To determine whether an instruction has slack, Fields et al. employ an elegant delay-

4 There too, multiple components are provided, offering a range of power/performance characteristics, and work

(computation) is dynamically steered according to run-time conditions and goals.

Search WWH ::

Custom Search

Home