Optimizing Capacitance and Switching Activity to Reduce Dynamic Power - Computer Architecture Techniques for Power-Efficiency

Information Technology Reference

In-Depth Information

head is promoted from segment to segment, it pulls behind it like the wagons of a train its chain

of dependent instructions. A large number of small segments allows for a very large instruction

queue while at the same time for faster clock speeds. However, compared to a monolithic design

of the same size (and assuming equal frequency), the dependence-chain IQ delivers somewhat

less performance. On the other hand, because the part of the IQ involved in issuing instructions

is quite small, power consumption can be substantially less than a monolithic design.

Lebeck et al. also propose a technique based on dependence chains [ 149 ]. In their scheme,

the instructions depending on a long-latency operation (e.g., a cache miss) are moved out of

the issue queue into a much larger “waiting instruction buffer” (WIB). A 2K-entry WIB with

a 32-entry IQ yields noteworthy speedups over a conventional 32-entry IQ. Although a power

analysis was not included in their work, it is likely that the power benefits for such a design

would also be noticeable compared to a (large) monolithic IQ at the same performance level.

Similarly, hierarchical scheduling windows (HSW) [ 34 ] increase the instruction window

size using a fast (but small) and a slow (but large) scheduling window. Latency-sensitive critical

instructions are moved to the fast scheduling window while latency-tolerant instructions remain

in the larger slower window. The difference from the previous work is that both scheduling

windows issue instructions, each to a separate cluster.

Other proposals are focused on reducing the cost—and by extent, complexity and power—

of checkpointing, which is a serious impediment to large instruction windows [ 5 , 82 ]. Finally,

there are also proposals that employ segmentation and resizing as well as more targeted opti-

mizations to reduce the design complexity of load/store queues [ 179 ]. All of these techniques

are power-efficient in the sense that they attempt to increase performance but with a very prudent

and frugal use of resources.

4.7 IDLE-CAPACITY SWITCHING ACTIVITY: CORE

Besides the instruction window, significant opportunities still abound in a dynamically sched-

uled out-of-order processor for further power optimizations. A related dimension to the in-

struction window size is the issue width—the number of instructions that can go through the

processor in parallel. Although we talk about issue width , we consider such techniques under

the Idle-Capacity optimizations. The reason is that, in contrast to idle-width optimizations dis-

cussed in Sections 4.3 and 4.4, adapting the issue width has nothing to do with the bit-width

of individual operations (arithmetic, logic, or memory operations) but rather with the behavior

of the program at a larger scale. This is consistent with, and in fact very similar to, the other

idle-capacity optimizations presented in Sections 4.5 and 4.8.

Depending on the instruction window size different programs exhibit different maxima

for parallel instruction issue. Adapting the processor to this dimension was proposed by Bahar

and Manne [ 19 ]. They propose to dynamically change the width of an 8-issue processor to

Search WWH ::

Custom Search

Home