Review of Memory Hierarchy - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

■ Least recently used (LRU)—To reduce the chance of throwing out information that will be

needed soon, accesses to blocks are recorded. Relying on the past to predict the future, the

block replaced is the one that has been unused for the longest time. LRU relies on a corol-

lary of locality: If recently used blocks are likely to be used again, then a good candidate

for disposal is the least recently used block.

■ First in, first out (FIFO)—Because LRU can be complicated to calculate, this approximates

LRU by determining the oldest block rather than the LRU.

A virtue of random replacement is that it is simple to build in hardware. As the number

of blocks to keep track of increases, LRU becomes increasingly expensive and is usually only

approximated. A common approximation (often called pseudo-LRU) has a set of bits for each

set in the cache with each bit corresponding to a single way (a way is bank in a set associat-

ive cache; there are four ways in four-way set associative cache) in the cache. When a set is

accessed, the bit corresponding to the way containing the desired block is turned on; if all the

bits associated with a set are turned on, they are reset with the exception of the most recently

turned on bit. When a block must be replaced, the processor chooses a block from the way

whose bit is turned of, often randomly if more than one choice is available. This approximates

LRU, since the block that is replaced will not have been accessed since the last time that all the

blocks in the set were accessed. Figure B.4 shows the difference in miss rates between LRU,

random, and FIFO replacement.

FIGURE B.4 Data cache misses per 1000 instructions comparing least recently used,

random, and first in, first out replacement for several sizes and associativities . There is

little difference between LRU and random for the largest size cache, with LRU outperforming

the others for smaller caches. FIFO generally outperforms random in the smaller cache sizes.

These data were collected for a block size of 64 bytes for the Alpha architecture using 10

SPEC2000 benchmarks. Five are from SPECint2000 (gap, gcc, gzip, mcf, and perl) and five

are from SPECfp2000 (applu, art, equake, lucas, and swim). We will use this computer and

these benchmarks in most figures in this appendix.

Q4: What Happens on a Write?

Reads dominate processor cache accesses. All instruction accesses are reads, and most instruc-

tions don't write to memory. Figures A.32 and A.33 in Appendix A suggest a mix of 10% stores

and 26% loads for MIPS programs, making writes 10%/(100% + 26% + 10%) or about 7% of

the overall memory traffic. Of the data cache traffic, writes are 10%/(26% + 10%) or about 28%.

Making the common case fast means optimizing caches for reads, especially since processors

Search WWH ::

Custom Search

Home