Hardware Reference
In-Depth Information
In recent designs, there are three other factors that have led to the use of higher associativity
in first-level caches. First, many processors take at least two clock cycles to access the cache
and thus the impact of a longer hit time may not be critical. Second, to keep the TLB out of
the critical path (a delay that would be larger than that associated with increased associativ-
ity), almost all L1 caches should be virtually indexed. This limits the size of the cache to the
page size times the associativity, because then only the bits within the page are used for the
index. There are other solutions to the problem of indexing the cache before address transla-
tion is completed, but increasing the associativity, which also has other benefits, is the most
atractive. Third, with the introduction of multithreading (see Chapter 3 ), conflict misses can
increase, making higher associativity more atractive.
Second Optimization: Way Prediction To Reduce Hit Time
Another approach reduces conflict misses and yet maintains the hit speed of direct-mapped
cache. In way prediction , extra bits are kept in the cache to predict the way, or block within the
set of the next cache access. This prediction means the multiplexor is set early to select the de-
sired block, and only a single tag comparison is performed that clock cycle in parallel with
reading the cache data. A miss results in checking the other blocks for matches in the next
clock cycle.
Added to each block of a cache are block predictor bits. The bits select which of the blocks
to try on the next cache access. If the predictor is correct, the cache access latency is the fast hit
time. If not, it tries the other block, changes the way predictor, and has a latency of one extra
clock cycle. Simulations suggest that set prediction accuracy is in excess of 90% for a two-way
set associative cache and 80% for a four-way set associative cache, with better accuracy on If
caches than D-caches. Way prediction yields lower average memory access time for a two-way
set associative cache if it is at least 10% faster, which is quite likely. Way prediction was irst
used in the MIPS R10000 in the mid-1990s. It is popular in processors that use two-way set
associativity and is used in the ARM Cortex-A8 with four-way set associative caches. For very
fast processors, it may be challenging to implement the one cycle stall that is critical to keeping
the way prediction penalty small.
An extended form of way prediction can also be used to reduce power consumption by us-
ing the way prediction bits to decide which cache block to actually access (the way predic-
tion bits are essentially extra address bits); this approach, which might be called way selection ,
saves power when the way prediction is correct but adds significant time on a way mispredic-
tion, since the access, not just the tag match and selection, must be repeated. Such an optim-
ization is likely to make sense only in low-power processors. Inoue, Ishihara, and Murakami
[1999] estimated that using the way selection approach with a four-way set associative cache
increases the average access time for the I-cache by 1.04 and for the D-cache by 1.13 on the
SPEC95 benchmarks, but it yields an average cache power consumption relative to a normal
four-way set associative cache that is 0.28 for the I-cache and 0.35 for the D-cache. One signi-
icant drawback for way selection is that it makes it difficult to pipeline the cache access.
Example
Assume that there are half as many D-cache accesses as I-cache accesses, and
that the I-cache and D-cache are responsible for 25% and 15% of the processor's
power consumption in a normal four-way set associative implementation.
Search WWH ::




Custom Search