Information Technology Reference
In-Depth Information
a variety of sources including, peak to typical power ratios for same-generation Intel proces-
sors, and extrapolations from available thermal data. They also assume that the variance in
typical power consumption increases in more complex cores due to the wider issue width and
increased clock gating. To model power in an architectural simulator executing SPEC2000
benchmarks, Kumar et al. use the activity-based Wattch power models but calibrated accord-
ingly for each core. This is done with the help of scaling and offset factors so as to match
the results of the simulator with the estimated peak and typical power consumptions of the
cores.
The multi-core architecture in this study is used in a specific way: only one application is
run at any one time, i.e., only one core is active. The appropriate core to run the application is
chosen to optimize a given objective function (a combination of energy and performance goals).
All other cores are powered down expending neither dynamic nor leakage power. Because
there is a cost to switch an application from one core to another, the granularity of switching
is kept at the OS scheduling quantum (task switching). This is convenient for two reasons.
First, the operating system can orchestrate the core switching. Second, saving and restoring
the processor state happens by default at the scheduling interval so it does not represent an
additional overhead for core switching. 14 Alternatively, choosing on which core to run an
application could be performed (even statically) at the granularity of an entire application, but
this would preclude adaptation to the needs of individual program phases of the application.
Kumar et al. show that both power and performance vary considerably depending on
program phase. On the same core—as expected—performance varies from phase to phase. But
more importantly, the relative performance difference among phases depends on which core
executes the application. For instance, running on EV4, the performance difference among
phases might not be that great; in contrast, it can vary widely on EV8-. This makes the
relative performance among cores vary according to application phase: in some phases EV8-
performance is much higher than in other cores; in other phases the performance difference is
hardly noticeable.
Things are more interesting when, in addition to performance, energy is taken into
account. Tracking Energy
Delay across different phases on the same core shows that the
relative difference for this metric across cores also varies with phase. In addition, the ordering of
the cores based on this metric is frequently upset ! This means that sometimes EV4 can have a better
EDP than EV8- and vice versa! This of course is a strong incentive for core switching on a
phase granularity. Note, however, that no interval-based approach, even with oracle knowledge,
can guarantee the global optimization of EDP or ED 2 P. See “Sidebar: Pitfalls in optimizing
EDP.”
×
14 L1 caches are local to each core, so only the contents of the shared L2 cache are preserved across a core switch.
Search WWH ::




Custom Search