Optimizing Capacitance and Switching Activity to Reduce Dynamic Power - Computer Architecture Techniques for Power-Efficiency

Information Technology Reference

In-Depth Information

a hit takes place on a line in MRU position i

the corresponding counter MRU[ i ]is

incremented.

These statistics are important because hits in various MRU positions correspond to hits

in different cache configurations. Hits in the first MRU position correspond to hits in a direct-

mapped cache; the combined hits in the first and second MRU position correspond to hits in

a two-way set-associative cache; and so on. Thus, hits in any configuration of the primary and

secondary groups can be derived simply by summing up hits in the appropriate MRU positions.

This leads to one-shot configuration by allowing one to assess in one go all possible outcomes

and select the “best” configuration. In contrast, a configuration search would have to try each

and every configuration for an entire interval and then make a decision.

Here's how one-shot configuration is done in more detail. Statistics are gathered in

intervals of 100 000 instructions. Since the statistics are independent of the cache configuration

in the interval, they can be used to try “what if” scenarios for any cache configuration. Assuming

that the statistics of an interval are a good indication for the behavior of the next, the most

appropriate configuration for the next window can be thus uncovered.

The “what if” scenarios use simple memory access latency and energy cost models. These

models calculate the effective memory latency and the energy of a configuration as a function

of the hits in its primary and secondary groups. The calculations are performed in a software

interrupt handler which also decides on the next configuration.

The policy to decide the next configuration is to go for the lowest energy consumption

given a limit in the tolerated performance loss (called tolerance level ). This sounds similar to the

policy used in the selective cache ways, but goes further. It has memory. It keeps an account

of what happens in each interval and builds credit or debit for both performance and energy.

So, for example, if previous configurations had better performance than the corresponding

estimates indicated, the policy becomes more aggressive in trying to reduce energy since it has

performance credit . On the contrary, if a performance deficit from previous configurations was

accumulated, the policy has to make up for it, giving up on energy reduction.

This accounting scheme is a result of the one-shot configuration relying on an estimate

on what happens in the upcoming interval. This estimate relies, in turn, on the assumption

that the measured statistics do not differ noticeably from interval to interval. But in reality

they do differ. Accounting normalizes the differences between the estimated and the actual by

employing credit or debit in the next configuration decision.

The accounting cache yields very good power results with a rather small impact on

performance. As Figure 4.20 shows, for tolerance settings of 1/64, 1/16, and 1/4 (1.5, 6.2,

and 0.25 in the graph), energy savings range from 54% to 58% for the instruction L1, 29% to

45% for the data L1, and 25 to 63% for a unified L2 with parallel tag/data access. Overall, for

Search WWH ::

Custom Search

Home