Information Technology Reference
In-Depth Information
alone. In either scenario, research questions arise regarding whether to make offline (e.g.,
compile-time) decisions about DVFS settings, versus online, reactive, approaches.
(3) What is the hardware granularity at which voltage and frequency can be controlled?
This question is closely related to the question above. The bulk of the DVFS research has
focused on cases in which the entire processor core operates at the same ( V , f ) setting but
is asynchronous to the “outside” work, such as main memory. In such scenarios, the main
goal of DVFS is to capitalize on cases in which the processor's workload is heavily memory-
bound. In these cases, the processor is often stalled waiting on memory, so reducing its supply
voltage and clock frequency will reduce power and energy without having significant impact on
performance.
Other work has considered cases in which multiple clock domains may exist on a chip.
These so-called MCD scenarios might either be multiple clock domains within a single pro-
cessor core [ 199 , 200 , 216 , 227 , 228 ] or chip multiprocessors in which each on-chip processor
core has a different voltage/clock domain [ 67 ]. This dimension is explored in Section 3.4.
(4) How do the implementation characteristics of the DVFS approach being used affect the
strategies to employ? Some of the implementation characteristics for DVFS can have significant
influence on the strategies an architect might choose, and the likely payoffs they might offer.
For example, what is the delay required to engage a new setting of ( V , f )? (And, can the
processor continue to execute during the transition from one ( V , f ) pair to another?) If the
delay is very short, then simple reactive techniques may offer high payoff. If the delay is quite
long, however, then techniques based on more intelligent or offline analysis might make more
sense.
(5) How does the DVFS landscape change when considering parallel applications on multiple-
core processors? When considering one, single-threaded application in isolation, one need only
consider the possible asynchrony between compute and memory. In other regards, reducing
the clock frequency proportionately degrades the performance. In a parallel scenario, however,
reducing the clock frequency of one thread may impact other dependent threads that are waiting
for a result to be produced. Thus, when considering DVFS for parallel applications, some notion
of critical path analysis may be helpful.
Another similar question regards whether continuous settings of ( V , f ) pairs are possible,
or whether these values can only be changed in fixed, discrete steps. If only discrete step-wise
adjustments of ( V , f ) are possible, then the optimization space becomes difficult to navigate
because it is “non-convex.” As a result, simple online techniques might have difficulty finding
global optima, and more complicated or offline analysis again becomes warranted.
Because DVFS is available for experimentation on real systems [ 111 , 112 , 2 ], and because
it offers such high leverage in power/energy savings, it has been widely studied in a variety of
communities. Our discussion only touches on some of the key observations from the architectural
Search WWH ::




Custom Search