Information Technology Reference
In-Depth Information
Despite the magnitude of these savings—the low-hanging fruit in this case—Folegnani
and Gonzalez go one step further. They resize the IQ to fit program needs. The interesting
difference from the earlier proposals is the IQ is resized logically —not physically—by partition-
ing and selectively disabling parts of it. Besides the head and tail pointers, they introduce a new
pointer, called the limit pointer which always moves at a fixed offset from the head pointer.
This pointer limits the logical size of the instruction queue by excluding the entries between
the head pointer and itself from being allocated. Figure 4.15 shows the new “disabled” area
defined by that pointer. What this does is to add a known number (the offset from the head
pointer) of guaranteed empty entries that will not participate in tag matching. The question
now is how to maximize the disabled area without negatively impacting performance.
ILP-contribution feedback control : This is done using a heuristic with empirically chosen
parameters. The IQ is logically divided in 16 partitions. The idea for the heuristic is to measure
the contribution to performance from the youngest partition of the IQ which is the partition
allocated most recently at the tail pointer. The contribution of a partition is measured in terms
of issued instructions from this partition within a time window. If that contribution is below
some empirically chosen threshold, then the effective size of the IQ is reduced by expanding
the disabled area. Periodically the effective IQ size is increased (by contracting the disabled
area). This simple scheme increases the energy savings to about 91% with a modest 1.7% IPC
loss.
4.6.5 Other Power Optimizations for the Instruction Queue
About the same time as with the Buyuktosunoglu et al. and the Folegnani and Gonzalez papers,
a slew of techniques were proposed to reduce power in instruction queues. Some of them are at
the circuit level but motivated by architectural characteristics, such as the techniques proposed
by Kucuk, Ghose, Ponomarev, and Kogge [ 145 ]. They propose three techniques to reduce IQ
dynamic power: (i) efficient comparators in the CAM part, (ii) significance compression in the
SRAM part (which is another example of the technique described in Section 4.3, “Idle-Width
Switching Activity: Core”), and (iii) Bit-line segmentation (which is explained in “Sidebar:
Bit-line Segmentation”). Their first proposal goes beyond disabling the tag match for empty
entries and ready operands. It attempts to minimize tag match energy for all those entries
that do participate in the tag comparison but not match. The technique exploits the prevailing
per-bit behavior of typical programs. Because of the localization of dependencies in a program,
a mismatch is much more likely to occur in the least significant bits of an operand tag. This
means that just checking the lower four out of the eight tag bits can reveal a mismatch of 90%
of the time saving, in this case, half of the power of a full comparison.
Search WWH ::




Custom Search