Optimizing Capacitance and Switching Activity to Reduce Dynamic Power - Computer Architecture Techniques for Power-Efficiency

Information Technology Reference

In-Depth Information

a variety of sources including, peak to typical power ratios for same-generation Intel proces-

sors, and extrapolations from available thermal data. They also assume that the variance in

typical power consumption increases in more complex cores due to the wider issue width and

increased clock gating. To model power in an architectural simulator executing SPEC2000

benchmarks, Kumar et al. use the activity-based Wattch power models but calibrated accord-

ingly for each core. This is done with the help of scaling and offset factors so as to match

the results of the simulator with the estimated peak and typical power consumptions of the

cores.

The multi-core architecture in this study is used in a specific way: only one application is

run at any one time, i.e., only one core is active. The appropriate core to run the application is

chosen to optimize a given objective function (a combination of energy and performance goals).

All other cores are powered down expending neither dynamic nor leakage power. Because

there is a cost to switch an application from one core to another, the granularity of switching

is kept at the OS scheduling quantum (task switching). This is convenient for two reasons.

First, the operating system can orchestrate the core switching. Second, saving and restoring

the processor state happens by default at the scheduling interval so it does not represent an

additional overhead for core switching. 14 Alternatively, choosing on which core to run an

application could be performed (even statically) at the granularity of an entire application, but

this would preclude adaptation to the needs of individual program phases of the application.

Kumar et al. show that both power and performance vary considerably depending on

program phase. On the same core—as expected—performance varies from phase to phase. But

more importantly, the relative performance difference among phases depends on which core

executes the application. For instance, running on EV4, the performance difference among

phases might not be that great; in contrast, it can vary widely on EV8-. This makes the

relative performance among cores vary according to application phase: in some phases EV8-

performance is much higher than in other cores; in other phases the performance difference is

hardly noticeable.

Things are more interesting when, in addition to performance, energy is taken into

account. Tracking Energy

Delay across different phases on the same core shows that the

relative difference for this metric across cores also varies with phase. In addition, the ordering of

the cores based on this metric is frequently upset ! This means that sometimes EV4 can have a better

EDP than EV8- and vice versa! This of course is a strong incentive for core switching on a

phase granularity. Note, however, that no interval-based approach, even with oracle knowledge,

can guarantee the global optimization of EDP or ED 2 P. See “Sidebar: Pitfalls in optimizing

EDP.”

×

14 L1 caches are local to each core, so only the contents of the shared L2 cache are preserved across a core switch.

Search WWH ::

Custom Search

Home