Information Technology Reference
In-Depth Information
Profile-assisted compiler approach : Hsu and Kremer's work provided a heuristic technique
that lowers the voltage for memory-bound sections [ 103 ]. The intuition behind their approach
is that if the processor and memory operate largely asynchronously from each other, then the
processor can be dialed down to much lower clock frequencies during memory-bound regions,
with considerable energy savings but no significant performance loss. They implemented their
technique within the SUIF2 source-to-source compiler infrastructure (gcc compilers were used
to generate object code).
The compiler algorithm is based on heuristics and profiling information to solve a
minimization problem. Using the author's description, the problem can be stated as follows:
Given a program P , find a program region R and a frequency f (lower than the maximum
frequency f max ) such that, if R is executed at the reduced frequency f and with reduced voltage,
the total execution time (including the voltage/frequency scaling overhead) is not
increased more than a small factor over the original execution time, and
the total energy usage is minimized.
Candidate regions are considered to be loop nests, call sites, called procedures, statement
sequences (straight-line code), or even the entire program. Restricting regions to the above
programming constructs has the benefit of making the number of DVFS switchings tractable,
since the number of times such regions execute can be determined with reasonable accuracy
either statically or by profiling . DVFS occurs only on entering and exiting a region. Finally,
candidate regions are selected by size, so DVFS switchings occur only for significantly large
pieces of code.
To implement a compiler algorithm to solve this minimization problem, two pieces of
information are needed for each candidate region R : an estimate of its execution time at a
frequency f , denoted T ( R
f ), and the number of times N ( R ) the region executes during the
lifetime of the program. T ( R
,
f )and N ( R ) are computed, depending on the programming
construct involved, according to the rules shown in Figure 3.1.
T ( R
,
f ) values for regions, that do not decompose further into smaller regions, are
provided by profiling—along with the N ( R ) values that cannot be computed statically. 2 Using
the T ( R
,
f )and N ( R ) information, compiler heuristics then select the appropriate regions to
annotate for DVFS.
Hsu and Kremer use an experimental setup to measure power in laptops (with Linux and
GNU compilers). With the help of a digital power meter and by annotating the programs with
mode-set instructions, which select DVFS settings on AMD mobile Athlon 4 and Transmeta
,
2 The authors cite analytic techniques to compute T ( R
,
f ) given information for T ( R
,
f max ), but these techniques
were not used in practice.
Search WWH ::




Custom Search