Optimizing Capacitance and Switching Activity to Reduce Dynamic Power - Computer Architecture Techniques for Power-Efficiency

Information Technology Reference

In-Depth Information

Despite the magnitude of these savings—the low-hanging fruit in this case—Folegnani

and Gonzalez go one step further. They resize the IQ to fit program needs. The interesting

difference from the earlier proposals is the IQ is resized logically —not physically—by partition-

ing and selectively disabling parts of it. Besides the head and tail pointers, they introduce a new

pointer, called the limit pointer which always moves at a fixed offset from the head pointer.

This pointer limits the logical size of the instruction queue by excluding the entries between

the head pointer and itself from being allocated. Figure 4.15 shows the new “disabled” area

defined by that pointer. What this does is to add a known number (the offset from the head

pointer) of guaranteed empty entries that will not participate in tag matching. The question

now is how to maximize the disabled area without negatively impacting performance.

ILP-contribution feedback control : This is done using a heuristic with empirically chosen

parameters. The IQ is logically divided in 16 partitions. The idea for the heuristic is to measure

the contribution to performance from the youngest partition of the IQ which is the partition

allocated most recently at the tail pointer. The contribution of a partition is measured in terms

of issued instructions from this partition within a time window. If that contribution is below

some empirically chosen threshold, then the effective size of the IQ is reduced by expanding

the disabled area. Periodically the effective IQ size is increased (by contracting the disabled

area). This simple scheme increases the energy savings to about 91% with a modest 1.7% IPC

loss.

4.6.5 Other Power Optimizations for the Instruction Queue

About the same time as with the Buyuktosunoglu et al. and the Folegnani and Gonzalez papers,

a slew of techniques were proposed to reduce power in instruction queues. Some of them are at

the circuit level but motivated by architectural characteristics, such as the techniques proposed

by Kucuk, Ghose, Ponomarev, and Kogge [ 145 ]. They propose three techniques to reduce IQ

dynamic power: (i) efficient comparators in the CAM part, (ii) significance compression in the

SRAM part (which is another example of the technique described in Section 4.3, “Idle-Width

Switching Activity: Core”), and (iii) Bit-line segmentation (which is explained in “Sidebar:

Bit-line Segmentation”). Their first proposal goes beyond disabling the tag match for empty

entries and ready operands. It attempts to minimize tag match energy for all those entries

that do participate in the tag comparison but not match. The technique exploits the prevailing

per-bit behavior of typical programs. Because of the localization of dependencies in a program,

a mismatch is much more likely to occur in the least significant bits of an operand tag. This

means that just checking the lower four out of the eight tag bits can reveal a mismatch of 90%

of the time saving, in this case, half of the power of a full comparison.

Search WWH ::

Custom Search

Home