Information Technology Reference
In-Depth Information
head is promoted from segment to segment, it pulls behind it like the wagons of a train its chain
of dependent instructions. A large number of small segments allows for a very large instruction
queue while at the same time for faster clock speeds. However, compared to a monolithic design
of the same size (and assuming equal frequency), the dependence-chain IQ delivers somewhat
less performance. On the other hand, because the part of the IQ involved in issuing instructions
is quite small, power consumption can be substantially less than a monolithic design.
Lebeck et al. also propose a technique based on dependence chains [ 149 ]. In their scheme,
the instructions depending on a long-latency operation (e.g., a cache miss) are moved out of
the issue queue into a much larger “waiting instruction buffer” (WIB). A 2K-entry WIB with
a 32-entry IQ yields noteworthy speedups over a conventional 32-entry IQ. Although a power
analysis was not included in their work, it is likely that the power benefits for such a design
would also be noticeable compared to a (large) monolithic IQ at the same performance level.
Similarly, hierarchical scheduling windows (HSW) [ 34 ] increase the instruction window
size using a fast (but small) and a slow (but large) scheduling window. Latency-sensitive critical
instructions are moved to the fast scheduling window while latency-tolerant instructions remain
in the larger slower window. The difference from the previous work is that both scheduling
windows issue instructions, each to a separate cluster.
Other proposals are focused on reducing the cost—and by extent, complexity and power—
of checkpointing, which is a serious impediment to large instruction windows [ 5 , 82 ]. Finally,
there are also proposals that employ segmentation and resizing as well as more targeted opti-
mizations to reduce the design complexity of load/store queues [ 179 ]. All of these techniques
are power-efficient in the sense that they attempt to increase performance but with a very prudent
and frugal use of resources.
4.7 IDLE-CAPACITY SWITCHING ACTIVITY: CORE
Besides the instruction window, significant opportunities still abound in a dynamically sched-
uled out-of-order processor for further power optimizations. A related dimension to the in-
struction window size is the issue width—the number of instructions that can go through the
processor in parallel. Although we talk about issue width , we consider such techniques under
the Idle-Capacity optimizations. The reason is that, in contrast to idle-width optimizations dis-
cussed in Sections 4.3 and 4.4, adapting the issue width has nothing to do with the bit-width
of individual operations (arithmetic, logic, or memory operations) but rather with the behavior
of the program at a larger scale. This is consistent with, and in fact very similar to, the other
idle-capacity optimizations presented in Sections 4.5 and 4.8.
Depending on the instruction window size different programs exhibit different maxima
for parallel instruction issue. Adapting the processor to this dimension was proposed by Bahar
and Manne [ 19 ]. They propose to dynamically change the width of an 8-issue processor to
Search WWH ::




Custom Search