Information Technology Reference
In-Depth Information
High-order bit array (24 bits x 4)
Low-order bit array
(8 bits x 4)
Compression tag bits
01 11
word 3
word 2
word 1
word 0
index
32-bit value
index
index
FIGURE 4.9: Frequent value cache (FVC). The low-order bit array provides dictionary indices for the
compressed words or the low-order 8 bits of uncompressed words. In the latter case, a second access is
required to retrieve the 24 high-order bits. Adapted from [ 235 ].
beyond significance compression, but to go any further, complexity and power consumption
increase significantly.
Frequent value cache : The frequent value cache (FVC), built around this compression
scheme, specifically targets power consumption [ 235 , 234 ]. In FVC, a cache line can contain
both compressed and uncompresed words. Their status is determined by additional bits in the
line's tag. A compressed word is simply an index to the frequent value dictionary. The index
occupies the low-order bits of the original value leaving the rest of the word empty. Assume that
the index occupies the eight low-order bits, allowing for a 256-entry frequent-value dictionary.
The key challenge is how to structure the cache to reduce the energy cost when a
compressed word is accessed. FVC does this by splitting cache lines into two different data
arrays: the first array (shown on the right in Figure 4.9) holds only the 8 low-order bits of
each word in the cache line, while the second array (shown on the left) holds the remaining
24 high-order bits of each of the words. Initially, only the first array is accessed. Thus, only
the indices of the compressed words or the eight low-order bits of uncompressed words are
obtained. If the requested word is compressed (indicated by the corresponding tag bit) minimal
energy was spent to access exactly what was needed—the dictionary index. The dictionary is
accessed next to obtain the actual value, but this is not nearly as expensive as accessing the
cache. If the accessed word is uncompressed, only its 8 low-order bits are accessed. The rest 24
high-order bits are still needed. For this, the second array is accessed in the subsequent cycle.
Both the dictionary access and the access to the second array increase cache latency,
but the dictionary access much less so. Yang and Gupta report an overall increase of about
3% in the execution time of SPEC95 benchmarks but at the same time a 29% reduction of
 
Search WWH ::




Custom Search