Optimizing Capacitance and Switching Activity to Reduce Dynamic Power - Computer Architecture Techniques for Power-Efficiency

Information Technology Reference

In-Depth Information

Element3

buffer

Element2

buffer

Element1

Element0

Address

Data

Address

Data

Bus

FIGURE 4.11: Complexity-adaptive structures. Adapted from [ 7 ].

are used in which a long wire is partitioned into shorter segments by placing buffers at regular

intervals along its length. Most of the processor's critical structures are (or will be) in need of

such methodologies.

The key observation made by Albonesi is that buffered wires readily lend themselves

to partitioning without any—significant—additional overhead. In other words, the ability to

disable part of a buffered wire (e.g., starting from some point onwards) comes almost for free

with buffering. The only catch is that tri-state buffers must be used in this case. Partitioning also

circumvents the problem of increased power consumption due to the repeaters (see “Sidebar:

Wire Partitioning”) by using only as much wire needed.

Large CAM or SRAM structures need long wires for their bit-lines and wordlines.

Implementing them with buffered wires endows such structures with the ability to partition

and deactivate some part of them—again, virtually for free. Figure 4.11 shows the main idea in

such structures.

Albonesi's target was to devise a complexity adaptive architecture where various structures

can be resized to provide a size versus speed trade-off. This trade-off is adjusted to best suit

application needs. If, for example, an application would benefit from larger structures, even at

the expense of a slower clock, a complexity-adaptive architecture would adjust to provide this

trade-off. Conversely, applications that have no need for large structures but their performance

is tied to clock speed, benefit by scaling down resizable structures to achieve faster clocks. In

practice, varying the clock fast enough can be challenging. Moreover, slowing down everything

just to increase the size of a single structure can quickly bite into any potential benefits. For these

reasons, rather than slowing down the clock, the latency (in cycles) of the resized structures

could be increased instead.

Albonesi gives convincing examples by adapting two separate structures, the cache hi-

erarchy and the instruction queue, to the needs of various applications. He did this, however,

by examining all possible configurations for these structures and for the application as a

whole—although he also mentions finer-grain adaptability at the end. In other words, he

Search WWH ::

Custom Search

Home