Hardware Reference
In-Depth Information
them on a circuit board alongside the processor. This is why the Pentium II was designed as a
cartridge rather than what looked like a chip.
One problem was the speed of the available third-party cache chips. The fastest ones on the market
were 3ns or higher, meaning 333MHz or less in speed. Because the processor was being driven in
speeds above that, in the Pentium II and initial Pentium III processors, Intel had to run the L2 cache at
half the processor speed because that is all the commercially available cache memory could handle.
AMD followed suit with the Athlon processor, which had to drop L2 cache speed even further in
some models to two-fifths or one-third the main CPU speed to keep the cache memory speed less than
the 333MHz commercially available chips.
Then a breakthrough occurred, which first appeared in Celeron processors 300A and above. These
had 128KB of L2 cache, but no external chips were used. Instead, the L2 cache had been integrated
directly into the processor core just like the L1. Consequently, both the L1 and L2 caches now would
run at full processor speed, and more importantly scale up in speed as the processor speeds increased
in the future. In the newer Pentium III, as well as all the Xeon and Celeron processors, the L2 cache
runs at full processor core speed, which means there is no waiting or slowing down after an L1 cache
miss. AMD also achieved full-core speed on-die cache in its later Athlon and Duron chips. Using on-
die cache improves performance dramatically because 9% of the time the system uses the L2. It now
remains at full speed instead of slowing down to one-half or less the processor speed or, even worse,
slowing down to motherboard speed as in Socket 7 designs. Another benefit of on-die L2 cache is
cost, which is less because fewer parts are involved. L3 on-die caches offer the same benefits for
those times when L1 and L2 cache do not contain the desired data. And, because L3 cache is much
larger than L2 cache (6MB in AMD Phenom II and 12MB in Core i7 Extreme Edition), the odds of all
three cache levels not containing the information desired are reduced over processors which have
only L1 and L2 cache. Let's revisit the restaurant analogy using a 3.6GHz processor. You would now
be taking a bite every half second (3.6GHz = 0.28ns cycling). The L1 cache would also be running at
that speed, so you could eat anything on your table at that same rate (the table = L1 cache). The real
jump in speed comes when you want something that isn't already on the table (L1 cache miss), in
which case the waiter reaches over to the cart (which is now directly adjacent to the table) and 9 out
of 10 times is able to find the food you want in just over one-quarter second (L2 speed = 3.6GHz or
0.28ns cycling). In this system, you would run at 3.6GHz 99% of the time (L1 and L2 hit ratios
combined) and slow down to RAM speed (wait for the kitchen) only 1% of the time, as before. With
faster memory running at 800MHz (1.25ns), you would have to wait only 1.25 seconds for the food to
come from the kitchen. If only restaurant performance would increase at the same rate processor
performance has!
Cache Organization
You know that cache stores copies of data from various main memory addresses. Because the cache
cannot hold copies of the data from all the addresses in main memory simultaneously, there has to be
a way to know which addresses are currently copied into the cache so that, if we need data from those
addresses, it can be read from the cache rather than from the main memory. This function is performed
by Tag RAM, which is additional memory in the cache that holds an index of the addresses that are
copied into the cache. Each line of cache memory has a corresponding address tag that stores the main
memory address of the data currently copied into that particular cache line. If data from a particular
main memory address is needed, the cache controller can quickly search the address tags to see
whether the requested address is currently being stored in the cache (a hit) or not (a miss). If the data
 
Search WWH ::




Custom Search