Memory Hierarchy Design - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

In recent designs, there are three other factors that have led to the use of higher associativity

in first-level caches. First, many processors take at least two clock cycles to access the cache

and thus the impact of a longer hit time may not be critical. Second, to keep the TLB out of

the critical path (a delay that would be larger than that associated with increased associativ-

ity), almost all L1 caches should be virtually indexed. This limits the size of the cache to the

page size times the associativity, because then only the bits within the page are used for the

index. There are other solutions to the problem of indexing the cache before address transla-

tion is completed, but increasing the associativity, which also has other benefits, is the most

atractive. Third, with the introduction of multithreading (see Chapter 3 ), conflict misses can

increase, making higher associativity more atractive.

Second Optimization: Way Prediction To Reduce Hit Time

Another approach reduces conflict misses and yet maintains the hit speed of direct-mapped

cache. In way prediction , extra bits are kept in the cache to predict the way, or block within the

set of the next cache access. This prediction means the multiplexor is set early to select the de-

sired block, and only a single tag comparison is performed that clock cycle in parallel with

reading the cache data. A miss results in checking the other blocks for matches in the next

clock cycle.

Added to each block of a cache are block predictor bits. The bits select which of the blocks

to try on the next cache access. If the predictor is correct, the cache access latency is the fast hit

time. If not, it tries the other block, changes the way predictor, and has a latency of one extra

clock cycle. Simulations suggest that set prediction accuracy is in excess of 90% for a two-way

set associative cache and 80% for a four-way set associative cache, with better accuracy on If

caches than D-caches. Way prediction yields lower average memory access time for a two-way

set associative cache if it is at least 10% faster, which is quite likely. Way prediction was irst

used in the MIPS R10000 in the mid-1990s. It is popular in processors that use two-way set

associativity and is used in the ARM Cortex-A8 with four-way set associative caches. For very

fast processors, it may be challenging to implement the one cycle stall that is critical to keeping

the way prediction penalty small.

An extended form of way prediction can also be used to reduce power consumption by us-

ing the way prediction bits to decide which cache block to actually access (the way predic-

tion bits are essentially extra address bits); this approach, which might be called way selection ,

saves power when the way prediction is correct but adds significant time on a way mispredic-

tion, since the access, not just the tag match and selection, must be repeated. Such an optim-

ization is likely to make sense only in low-power processors. Inoue, Ishihara, and Murakami

[1999] estimated that using the way selection approach with a four-way set associative cache

increases the average access time for the I-cache by 1.04 and for the D-cache by 1.13 on the

SPEC95 benchmarks, but it yields an average cache power consumption relative to a normal

four-way set associative cache that is 0.28 for the I-cache and 0.35 for the D-cache. One signi-

icant drawback for way selection is that it makes it difficult to pipeline the cache access.

Example

Assume that there are half as many D-cache accesses as I-cache accesses, and

that the I-cache and D-cache are responsible for 25% and 15% of the processor's

power consumption in a normal four-way set associative implementation.

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home