THE MICROARCHITECTURE LEVEL - Structured Computer Organization

Hardware Reference

In-Depth Information

Despite the complexity of the decision, access to a needed word can be

remarkably fast. As soon as the address is known, the exact location of the word is

known if it is present in the cache . This means that it is possible to read the word

out of the cache and deliver it to the processor at the same time that it is being de-

termined if this is the correct word (by comparing tags). So the processor actually

receives a word from the cache simultaneously, or possibly even before it knows

whether the word is the requested one.

This mapping scheme puts consecutive memory lines in consecutive cache en-

tries. In fact, up to 64 KB of contiguous data can be stored in the cache. However,

two lines that differ in their address by precisely 65,536 bytes or any integral mul-

tiple of that number cannot be stored in the cache at the same time (because they

have the same LINE value). For example, if a program accesses data at location X

and next executes an instruction that needs data at location X + 65,536 (or any

other location within the same line), the second instruction will force the cache

entry to be reloaded, overwriting what was there. If this happens often enough, it

can result in poor behavior. In fact, the worst-case behavior of a cache is worse

than if there were no cache at all, since each memory operation involves reading in

an entire cache line instead of just one word.

Direct-mapped caches are the most common kind of cache, and they perform

quite effectively, because collisions such as the one described above can be made

to occur only rarely, or not at all. For example, a very clever compiler can take

cache collisions into account when placing instructions and data in memory.

Notice that the particular case described would not occur in a system with separate

instruction and data caches, because the colliding requests would be serviced by

different caches. Thus we see a second benefit of two caches rather than one: more

flexibility in dealing with conflicting memory patterns.

Set-Associative Caches

As mentioned above, many different lines in memory compete for the same

cache slots. If a program using the cache of Fig. 4-38(a) heavily uses words at ad-

dresses 0 and at 65,536, there will be constant conflicts, with each reference poten-

tially evicting the other one from the cache. A solution is to allow two or more

lines in each cache entry. A cache with n possible entries for each address is called

an n-way set-associative cache . A four-way set-associative cache is illustrated in

Fig. 4-39.

A set-associative cache is inherently more complicated than a direct-mapped

cache because, although the correct set of cache entries to examine can be com-

puted from the memory address being referenced, a set of n cache entries must be

checked to see if the needed line is present. And they have to be checked very fast.

Nevertheless, simulations and experience show that two-way and four-way caches

perform well enough to make this extra circuitry worthwhile.

Search WWH ::

Custom Search

Home