Optimizing Capacitance and Switching Activity to Reduce Dynamic Power - Computer Architecture Techniques for Power-Efficiency

Information Technology Reference

In-Depth Information

main characteristics for the three approaches. Equally important to the partitioning technique

is the method for selecting a cache configuration to achieve power or performance goals.

4.8.1 Trading Memory Between Cache Levels

Cache resizing was also proposed in Albonesi's paper on complexity-adaptive structures along

with instruction queue resizing [ 7 ]. Both techniques rely on structures partitioned in segments

using buffered wires. Regarding caches, the whole memory comprising the cache hierarchy is

assumed to be segmented in this manner.

Albonesi's proposal calls for a variable division between the L1 and the L2. This dy-

namic division is based on assigning memory segments to be either in the L1 or in the L2.

Architecturally, the two caches are resized by increasing or decreasing their associativity—not

by changing the number of sets. Thus, cache indices remain the same throughout size changes.

This is necessary to avoid making resident data inaccessible after a change in indexing. Further-

more, cache exclusion is imposed between the L1 and the L2, guaranteeing that data remain

unique regardless of the movable boundary between the two levels. Cache inclusion, on the

other hand, can result in the same data appearing twice in the same cache. This is possible if

two copies of the same data initially residing in the L1 and the L2, respectively, end up in the

same cache after a resizing operation.

The variable boundary between L1 and L2 is intended for performance reasons. Making

the L1 smaller allows for a faster clock (the latency of the cache in cycles does not change), while

making it larger increases its hit ratio. In this initial work, no attempt is made to dynamically

control the configuration of the caches. Instead, all possible configurations are studied, each

persisting throughout the execution of a program.

Although this complexity-adaptive scheme yields performance benefits (depending on

the program and the configuration) no assessment is provided regarding its impact on power

consumption. However, the change in associativity in the L1 and the L2 (magnified by the

difference in the number of accesses between the two caches) can affect power consumption,

despite the fact that total amount of active memory remains constant.

Following the initial proposal for the variable L1/L2 division, Balasubramonian, Al-

bonesi, Buyuktosunoglu, and Dwarkadas take it one step further by proposing a more specific

and more detailed cache organization to achieve the same goal [ 21 ]. More importantly, they

also propose mechanisms to control the configuration of the caches at run-time.

The organization is based on a 2MB physical cache which is partitioned into four distinct

512KB subarrays. Each subarray is further partitioned into four segments with the help of

repeaters in the wordlines. Each of these segments acts as an associative way, either allocated

to the L1 or to the L2. Figure 4.17 shows the organization of the physical cache.

Search WWH ::

Custom Search

Home