Information Technology Reference
In-Depth Information
Element3
buffer
Element2
buffer
Element1
Element1
Element0
Element0
Address
Data
Address
Data
Bus
Bus
Bus
Bus
FIGURE 4.11: Complexity-adaptive structures. Adapted from [ 7 ].
are used in which a long wire is partitioned into shorter segments by placing buffers at regular
intervals along its length. Most of the processor's critical structures are (or will be) in need of
such methodologies.
The key observation made by Albonesi is that buffered wires readily lend themselves
to partitioning without any—significant—additional overhead. In other words, the ability to
disable part of a buffered wire (e.g., starting from some point onwards) comes almost for free
with buffering. The only catch is that tri-state buffers must be used in this case. Partitioning also
circumvents the problem of increased power consumption due to the repeaters (see “Sidebar:
Wire Partitioning”) by using only as much wire needed.
Large CAM or SRAM structures need long wires for their bit-lines and wordlines.
Implementing them with buffered wires endows such structures with the ability to partition
and deactivate some part of them—again, virtually for free. Figure 4.11 shows the main idea in
such structures.
Albonesi's target was to devise a complexity adaptive architecture where various structures
can be resized to provide a size versus speed trade-off. This trade-off is adjusted to best suit
application needs. If, for example, an application would benefit from larger structures, even at
the expense of a slower clock, a complexity-adaptive architecture would adjust to provide this
trade-off. Conversely, applications that have no need for large structures but their performance
is tied to clock speed, benefit by scaling down resizable structures to achieve faster clocks. In
practice, varying the clock fast enough can be challenging. Moreover, slowing down everything
just to increase the size of a single structure can quickly bite into any potential benefits. For these
reasons, rather than slowing down the clock, the latency (in cycles) of the resized structures
could be increased instead.
Albonesi gives convincing examples by adapting two separate structures, the cache hi-
erarchy and the instruction queue, to the needs of various applications. He did this, however,
by examining all possible configurations for these structures and for the application as a
whole—although he also mentions finer-grain adaptability at the end. In other words, he
Search WWH ::




Custom Search