Information Technology Reference
In-Depth Information
A-zero-48
CLK
48
A
Latch
High
A 63-16
Operand A
CLK
16
A
Latch
Low
A 15-0
FU
16
48
B-zero-48
64
CLK
0
48
A
Latch
High
B 63-16
Result
(to registers)
Operand B
1
CLK
48
Zero
Detect
Result-zero-48
16
A
Latch
Low
B 15-0
FIGURE 4.8: Clock-gating ALUs for narrow-width operands. Adapted from [ 37 ].
if both operands of an operation are tagged as narrow [ 37 ]. Potentially, part of the ALU could
be safely disabled even if only one of the arguments is narrow but it would be too complicated
to guarantee this beforehand. If both arguments are narrow, only their 16 lower bits are latched
on the ALU latches. This ensures that no switching occurs for the 48 upper bits and is akin to
clock gating the unused portion of the ALU (for static logic). To create the correct wide result,
the appropriate bits (zeros or ones) are multiplexed on to the result bus. This technique yields
significant power savings for the CPU's integer unit comprising of an adder, a booth multiplier,
bit-wise logic, and a shifter. Specifically, in an Alpha-class, 4-instruction-wide superscalar,
the average power consumption of the integer units can be reduced by 55% and 58% for the
SPECint95 and the Mediabench benchmark suites, respectively [ 37 ].
Operation packing: packing narrow-width values : While Value Gating adjusts the width
of the machine to the operand width, in this technique two operations with narrow-width
operands are simultaneously issued to a single full-width ALU. This can increase performance
(if there is contention for the ALUs) without incurring significant power overhead (since
switching activity remains approximately the same) and as a consequence improve EDP.
The implementation is simple: issue logic detects two instructions that perform identical
operations, are ready-to-issue, and that have all of their operands tagged as narrow-width. A
set of multiplexors shifts the significant part of the operands of one of the operations into the
high-order bits of the ALU inputs. The significant parts of the operands of the other operation
remain at their normal position in the low-order bits. The combined operations are executed in
the ALU in SIMD mode, similarly to SIMD multimedia extension instructions. The CPU is
Search WWH ::




Custom Search