Information Technology Reference
In-Depth Information
The structures that comprise instruction scheduling, especially the instruction queue,
have been prime targets for power optimizations. And for a good reason: not only instruction
scheduling consumes a significant part of a processor's total power, but also does not scale
well to larger sizes. Starting from the techniques discussed here, many others followed in the
literature offering further improvements and optimizations.
4.6.6 Related Work on Instruction Windows
The importance of the instruction window for performance, its high complexity, and its crit-
icality (in terms of latency), make it a prime target for optimizations. Here, we give a short
overview of some of the most relevant work. Even though this work is mostly focused on
performance or complexity and does not specifically addresses power consumption, it can have
significant ramifications on power.
Palacharla, Jouppi, and Smith, in the context of their work on complexity-effective archi-
tectures [ 177 ], first proposed to lower the complexity of a CAM-based monolithic instruction
queue by replacing it with a number of FIFO queues. Instead of searching the whole IQ for
ready instructions, the search is limited to the heads of the FIFOs. Subsequently, Canal and
Gonzalez [ 45 , 46 ] and independently Michaud and Seznec [ 166 ] propose dataflow schedul-
ing arrays, augmented with fully associative buffers to accommodate unpredictable-latency
instructions.
Following these initial proposals on reducing the complexity of an ordinary sized in-
struction queue, the converse idea came into focus: instead of reducing complexity, apply
these techniques to actually enlarge the instruction queue. Two groups, Raasch, Binkert, and
Reinhardt and, independently, Lebeck, Koppanalil, Patwardhan, and Rotenberg, proposed to
enlarge IQ size using dependence chains [ 186 , 149 ]. Dependent chains are groups of dependent
instructions usually headed by an instruction of unpredictable latency. A head instruction can
itself be a dependent instruction in another dependence chain.
Raasch et al. proposed a segmented IQ design where instructions are promoted from
segment to segment until they reach the “issue-buffer” segment. From there (and only there)
they can issue to the functional units. Instructions are placed in the various IQ segments
according to their expected delay in becoming ready. For example, ready instructions and their
immediate dependent instructions are placed in the first (issue-buffer) segment; instructions
that are two or three cycles away from becoming ready are placed in second segment (after the
issue-buffer); and so on. In each cycle instructions are promoted from segment to segment as
they advance toward becoming ready.
The important innovation in their proposal is that the expected delay of instructions is
not measured in absolute time but relatively to the head of the dependence chain. Thus, as the
Search WWH ::




Custom Search