Information Technology Reference
In-Depth Information
application of clock gating, more sophisticated techniques to reduce cache power are presented
in Sections 4.4 and 4.9.
Finally, information from the Execute stage is used two cycles later to clock-gate the
Writeback stage. In the Writeback stage, data is put on the result bus and routed to the
Instruction queue to wake up any waiting instructions. Clock-gating in the Writeback stage
is somewhat different than in other dynamic-logic stages. Power is consumed only when bus
lines switch logic levels. Additional techniques to reduce bus switching are presented in Section
4.12. To prevent spurious switching when the result bus is idle, the data latches feeding the
bus-line drivers are clock-gated to shield bus lines from any changes.
Li et al. evaluate deterministic clock gating (DCG) [ 152 ] with Wattch. By applying
DCG to all the latches and stages described above, they report power savings of 21% and 19%
(on average) for the SPEC2000 integer and floating point benchmarks, respectively. They also
compare deterministic clock gating to a predictive clock gating technique, Pipeline Balancing
(PB) [ 19 ], which is presented in detail in Section 4.7. Pipeline Balancing adjusts the width of
the superscalar pipeline by gating functional unit clusters to match the needs of programs. Being
a coarser grain technique, it misses some of the opportunities to gate idle hardware, resulting—
according to Li et al.—in power savings of less than 10%. PB, being also a predictive technique,
can also err and negatively affect performance (incurring a 2-3% slowdown on average) which
is why DCG fares even better (compared to Pipeline Balancing) in terms of EDP.
4.2.4 Clock gating examples
Today, virtually all processor designs use clock gating to some degree. It is interesting that
not only low-power designs but also many high-performance processors utilize extensive clock
gating because of its nonextant impact on performance. Two prominent examples (one high-
performance, the other low-power) are the IBM's Power5 [ 57 ] and Intel's XScale processors
[ 58 ].
Power5 : Dynamic clock gating is extensively used in this high-performance IBM pro-
cessor [ 57 ]. According to IBM, the use of clock gating yields a reduction in switching power
by more than 25% without affecting either performance or frequency. The larger the unit that
is clock gated, the more likely it is to cause d i
d t problems, i.e., large swings in the current
of the power rails that charge these units when the clock signal is reinstated. For this reason
POWER5 implements fine-grain gating domains , limiting the induced noise. All clock gating
events are programmable, which allows extensive control over clock gating. Figure 4.5 shows a
clock gating circuit (adapted from [ 57 ]) of the Power5. There are both global and local clock
gating enable signals but the actual gating decision is taken by logic dedicated to each gated
unit. This logic produces the Dynamic Stop signal depending on the usage characteristics of the
gated unit.
/
Search WWH ::




Custom Search