Memory Hierarchy Design - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

First Optimization: Small And Simple First-Level Caches To

Reduce Hit Time And Power

The pressure of both a fast clock cycle and power limitations encourages limited size for irst-

level caches. Similarly, use of lower levels of associativity can reduce both hit time and power,

although such trade-offs are more complex than those involving size.

The critical timing path in a cache hit is the three-step process of addressing the tag memory

using the index portion of the address, comparing the read tag value to the address, and

setting the multiplexor to choose the correct data item if the cache is set associative. Direct-

mapped caches can overlap the tag check with the transmission of the data, effectively redu-

cing hit time. Furthermore, lower levels of associativity will usually reduce power because

fewer cache lines must be accessed.

Although the total amount of on-chip cache has increased dramatically with new genera-

tions of microprocessors, due to the clock rate impact arising from a larger L1 cache, the size

of the L1 caches has recently increased either slightly or not at all. In many recent processors,

designers have opted for more associativity rather than larger caches. An additional consider-

ation in choosing the associativity is the possibility of eliminating address aliases; we discuss

this shortly.

One approach to determining the impact on hit time and power consumption in advance

of building a chip is to use CAD tools. CACTI is a program to estimate the access time and

energy consumption of alternative cache structures on CMOS microprocessors within 10% of

more detailed CAD tools. For a given minimum feature size, CACTI estimates the hit time of

caches as cache size varies, associativity, number of read/write ports, and more complex para-

meters. Figure 2.3 shows the estimated impact on hit time as cache size and associativity are

varied. Depending on cache size, for these parameters the model suggests that the hit time for

direct mapped is slightly faster than two-way set associative and that two-way set associative

is 1.2 times faster than four-way and four-way is 1.4 times faster than eight-way. Of course,

these estimates depend on technology as well as the size of the cache.

Example

Using the data in Figure B.8 in Appendix B and Figure 2.3 , determine whether

a 32 KB four-way set associative L1 cache has a faster memory access time than

a 32 KB two-way set associative L1 cache. Assume the miss penalty to L2 is 15

times the access time for the faster L1 cache. Ignore misses beyond L2. Which

has the faster average memory access time?

Answer

Let the access time for the two-way set associative cache be 1. Then, for the two-

way cache:

Search WWH ::

Custom Search

Home