Hardware Reference
In-Depth Information
First Optimization: Small And Simple First-Level Caches To
Reduce Hit Time And Power
The pressure of both a fast clock cycle and power limitations encourages limited size for irst-
level caches. Similarly, use of lower levels of associativity can reduce both hit time and power,
although such trade-offs are more complex than those involving size.
The critical timing path in a cache hit is the three-step process of addressing the tag memory
using the index portion of the address, comparing the read tag value to the address, and
setting the multiplexor to choose the correct data item if the cache is set associative. Direct-
mapped caches can overlap the tag check with the transmission of the data, effectively redu-
cing hit time. Furthermore, lower levels of associativity will usually reduce power because
fewer cache lines must be accessed.
Although the total amount of on-chip cache has increased dramatically with new genera-
tions of microprocessors, due to the clock rate impact arising from a larger L1 cache, the size
of the L1 caches has recently increased either slightly or not at all. In many recent processors,
designers have opted for more associativity rather than larger caches. An additional consider-
ation in choosing the associativity is the possibility of eliminating address aliases; we discuss
this shortly.
One approach to determining the impact on hit time and power consumption in advance
of building a chip is to use CAD tools. CACTI is a program to estimate the access time and
energy consumption of alternative cache structures on CMOS microprocessors within 10% of
more detailed CAD tools. For a given minimum feature size, CACTI estimates the hit time of
caches as cache size varies, associativity, number of read/write ports, and more complex para-
meters. Figure 2.3 shows the estimated impact on hit time as cache size and associativity are
varied. Depending on cache size, for these parameters the model suggests that the hit time for
direct mapped is slightly faster than two-way set associative and that two-way set associative
is 1.2 times faster than four-way and four-way is 1.4 times faster than eight-way. Of course,
these estimates depend on technology as well as the size of the cache.
Example
Using the data in Figure B.8 in Appendix B and Figure 2.3 , determine whether
a 32 KB four-way set associative L1 cache has a faster memory access time than
a 32 KB two-way set associative L1 cache. Assume the miss penalty to L2 is 15
times the access time for the faster L1 cache. Ignore misses beyond L2. Which
has the faster average memory access time?
Answer
Let the access time for the two-way set associative cache be 1. Then, for the two-
way cache:
Search WWH ::




Custom Search