Optimizing Capacitance and Switching Activity to Reduce Dynamic Power - Computer Architecture Techniques for Power-Efficiency

Information Technology Reference

In-Depth Information

assumed to have support for SIMD instructions, e.g., it can segment the carry chain at 16-bit

intervals. The overhead for this technique consists of the set of multiplexors to shift operands

into high-order bits and the result to low order bits. Additionally, increased complexity is

required in the issue logic.

Speculative-operation packing : Packing operands together effectively increases the CPU's

issue bandwidth but the resulting speedup is small. The problem is that both operands of an

operation are required to be narrow-width in order for packing to be considered. This is too

restrictive. The odds of packing operations together can be significantly improved if we simply

require at least one operand to be narrow width, but not necessarily both. There is a good

chance that an operation between a narrow and a wide operand will not affect high-order bits.

In this case, the high-order bits of the wide operand can be carried over to the result, while its

low-order bits form a narrow operand.

It would be, however, too complex to guarantee, before performing the actual operation,

that the high-order bits remain intact. The solution is provided by architectures supporting

speculative execution: operations can be packed speculatively, as if their arguments were all

narrow-width. If something goes wrong, the packed operations are squashed and re-executed

(replayed) separately. The telltale sign of something going wrong is an overflow in the segmented

carry chain, meaning that high-order bits are indeed affected by the operation.

This optimization brings the speedup of packing narrow width operations to approx-

imately 4% for SPECint95 and 8% for Mediabench for an Alpha-class, 4-instruction-wide,

superscalar CPU [ 37 ]. Speedup increases with the width of the machine as more instructions

become available to choose from and pack together. This is a good result considering that the

55-58% power savings mentioned above for the first clock gating technique concern the integer

units only, which in reality consume about a 10% of the processor's total power.

4.3.2 Significance Compression

Until now we have discussed a fixed-width definition (16-bits) of narrow-width operands and

allowed for significant bits only in low-order positions. Relaxing these two constraints leads to

a more general approach proposed by Canal and Gonzalez [ 44 ], called significance compression .

The idea is to compress non-significant bits (strings of zeros or ones) anywhere they appear in

the full width of an operand. Each 32-bit word is augmented with a 3-bit tag describing the

“significance” of each of its four bytes. A byte can be either significant or a sign extension of its

preceding byte (i.e., just a string of zeros or ones). Of course, the first low-order byte cannot

be a sign extension of any other byte and is always taken to be significant. The tags encode the

significance of a byte in the manner shown in Table 4.2.

Canal and Gonzalez report that the majority of values (87%) in SPECint and Mediabench

benchmarks can be compressed with significance compression [ 44 ], although a good 75% of

Search WWH ::

Custom Search

Home