Information Technology Reference
In-Depth Information
assumed to have support for SIMD instructions, e.g., it can segment the carry chain at 16-bit
intervals. The overhead for this technique consists of the set of multiplexors to shift operands
into high-order bits and the result to low order bits. Additionally, increased complexity is
required in the issue logic.
Speculative-operation packing : Packing operands together effectively increases the CPU's
issue bandwidth but the resulting speedup is small. The problem is that both operands of an
operation are required to be narrow-width in order for packing to be considered. This is too
restrictive. The odds of packing operations together can be significantly improved if we simply
require at least one operand to be narrow width, but not necessarily both. There is a good
chance that an operation between a narrow and a wide operand will not affect high-order bits.
In this case, the high-order bits of the wide operand can be carried over to the result, while its
low-order bits form a narrow operand.
It would be, however, too complex to guarantee, before performing the actual operation,
that the high-order bits remain intact. The solution is provided by architectures supporting
speculative execution: operations can be packed speculatively, as if their arguments were all
narrow-width. If something goes wrong, the packed operations are squashed and re-executed
(replayed) separately. The telltale sign of something going wrong is an overflow in the segmented
carry chain, meaning that high-order bits are indeed affected by the operation.
This optimization brings the speedup of packing narrow width operations to approx-
imately 4% for SPECint95 and 8% for Mediabench for an Alpha-class, 4-instruction-wide,
superscalar CPU [ 37 ]. Speedup increases with the width of the machine as more instructions
become available to choose from and pack together. This is a good result considering that the
55-58% power savings mentioned above for the first clock gating technique concern the integer
units only, which in reality consume about a 10% of the processor's total power.
4.3.2 Significance Compression
Until now we have discussed a fixed-width definition (16-bits) of narrow-width operands and
allowed for significant bits only in low-order positions. Relaxing these two constraints leads to
a more general approach proposed by Canal and Gonzalez [ 44 ], called significance compression .
The idea is to compress non-significant bits (strings of zeros or ones) anywhere they appear in
the full width of an operand. Each 32-bit word is augmented with a 3-bit tag describing the
“significance” of each of its four bytes. A byte can be either significant or a sign extension of its
preceding byte (i.e., just a string of zeros or ones). Of course, the first low-order byte cannot
be a sign extension of any other byte and is always taken to be significant. The tags encode the
significance of a byte in the manner shown in Table 4.2.
Canal and Gonzalez report that the majority of values (87%) in SPECint and Mediabench
benchmarks can be compressed with significance compression [ 44 ], although a good 75% of
Search WWH ::




Custom Search