Hardware Reference
In-Depth Information
Valid
Valid
Valid
Valid
Tag
Data
Tag
Data
Tag
Data
Tag
Data
2047
7
6
5
4
3
2
1
0
Entry A
Entry B
Entry C
Entry D
Figure 4-39. A four-way set-associative cache.
The use of a set-associative cache presents the designer with a choice. When a
new entry is to be brought into the cache, which of the present items should be dis-
carded? The optimal decision, of course, requires a peek into the future, but a
pretty good algorithm for most purposes is LRU ( Least Recently Used ). This al-
gorithm keeps an ordering of each set of locations that could be accessed from a
given memory location. Whenever any of the present lines are accessed, it updates
the list, marking that entry the most recently accessed. When it comes time to re-
place an entry, the one at the end of the list—the least recently accessed—is the
one discarded.
Carried to the extreme, a 2048-way cache containing a single set of 2048 line
entries is also possible. Here all memory addresses map onto the single set, so the
lookup requires comparing the address against all 2048 tags in the cache. Note
that each entry must now have tag-matching logic. Since the LINE field is of 0
length, the TAG field is the entire address except for the WORD and BYTE fields.
Furthermore, when a cache line is replaced, all 2048 locations are possible candi-
dates for replacement. Maintaining an ordered list of 2048 entries requires a great
deal of bookkeeping, making LRU replacement infeasible. (Remember that this
list has to be updated on every memory operation, not just on a miss.) Surpris-
ingly, high-associativity caches do not improve performance much over low-asso-
ciativity caches under most circumstances, and in some cases actually perform
worse. For these reasons, set associativity beyond four-way is relatively unusual.
Finally, writes pose a special problem for caches. When a processor writes a
word, and the word is in the cache, it obviously must either update the word or dis-
card the cache entry. Nearly all designs update the cache. But what about updat-
ing the copy in main memory? This operation can be deferred until later, when the
cache line is ready to be replaced by the LRU algorithm. This choice is difficult,
 
Search WWH ::




Custom Search