Optimizing Capacitance and Switching Activity to Reduce Dynamic Power - Computer Architecture Techniques for Power-Efficiency

Information Technology Reference

In-Depth Information

High-order bit array (24 bits x 4)

Low-order bit array

(8 bits x 4)

Compression tag bits

01 11

word 3

word 2

word 1

word 0

index

32-bit value

index

FIGURE 4.9: Frequent value cache (FVC). The low-order bit array provides dictionary indices for the

compressed words or the low-order 8 bits of uncompressed words. In the latter case, a second access is

required to retrieve the 24 high-order bits. Adapted from [ 235 ].

beyond significance compression, but to go any further, complexity and power consumption

increase significantly.

Frequent value cache : The frequent value cache (FVC), built around this compression

scheme, specifically targets power consumption [ 235 , 234 ]. In FVC, a cache line can contain

both compressed and uncompresed words. Their status is determined by additional bits in the

line's tag. A compressed word is simply an index to the frequent value dictionary. The index

occupies the low-order bits of the original value leaving the rest of the word empty. Assume that

the index occupies the eight low-order bits, allowing for a 256-entry frequent-value dictionary.

The key challenge is how to structure the cache to reduce the energy cost when a

compressed word is accessed. FVC does this by splitting cache lines into two different data

arrays: the first array (shown on the right in Figure 4.9) holds only the 8 low-order bits of

each word in the cache line, while the second array (shown on the left) holds the remaining

24 high-order bits of each of the words. Initially, only the first array is accessed. Thus, only

the indices of the compressed words or the eight low-order bits of uncompressed words are

obtained. If the requested word is compressed (indicated by the corresponding tag bit) minimal

energy was spent to access exactly what was needed—the dictionary index. The dictionary is

accessed next to obtain the actual value, but this is not nearly as expensive as accessing the

cache. If the accessed word is uncompressed, only its 8 low-order bits are accessed. The rest 24

high-order bits are still needed. For this, the second array is accessed in the subsequent cycle.

Both the dictionary access and the access to the second array increase cache latency,

but the dictionary access much less so. Yang and Gupta report an overall increase of about

3% in the execution time of SPEC95 benchmarks but at the same time a 29% reduction of

Search WWH ::

Custom Search

Home