Information Technology Reference
In-Depth Information
FIGURE 3.4: Slack in CPU due to memory operations. Rproduced from [ 226 ]. Copyright 2005 IEEE.
processing unit. On average, their results achieved an energy-delay product (EDP) improve-
ment (over non-DVFS approaches) of 22.4% for SPEC95 FP, 21.5% for SPEC2K FP, 6.0%
for SPEC2K INT, and 22.7% for Olden benchmarks. These represent three to five times better
results than a baseline approach based on static DVFS decisions.
3.3.3 Coarse-Grained Analysis Based on Power Phases
The previously discussed compiler approaches used detailed off-line or on-line program analysis
to discern useful DVFS adjustment points. The online techniques of Wu et al. achieved their
detailed program knowledge through relatively high-overhead dynamic monitoring. Thus, it is
tempting to look for techniques that maintain such detailed knowledge but reduce monitoring
overhead. Since most general-purpose processors include a suite of user-readable hardware
performance counters, it is possible to build up a history of program behavior from seeing
aggregate event counts.
In particular, early work by Isci and Martonosi demonstrated how these event counts can
be viewed as identifying “fingerprints” of program phase behavior [ 115 ]. Essentially, this work
aggregated power data based on different hardware counters into a summation of different
power subcomponents. If each subcomponent is treated as one dimension in a vector space,
then these so-called power vectors can be used to identify unique aspects of power behavior
that call for different management approaches.
More recently, Isci, Contreras, and Martonosi elaborated on their technique by including
a predictor table that can predict future power behavior based on recently observed values [ 116 ].
This so-called Global Phase History Table (GPHT) is inspired by hardware branch predictors,
but is implemented in software by the operating system. Like a branch predictor, it stores a
“history table” of recently measured application metrics that are predictive of proper DVFS
adjustments. For example, one prototype implementation measured “memory operations per
Search WWH ::




Custom Search