Thread-Level Parallelism - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

FIGURE 5.18 The components of the kernel data miss rate change as the L1 data cache

size is increased from 32 KB to 256 KB, when the multiprogramming workload is run on

eight processors . The compulsory miss rate component stays constant, since it is unaffected

by cache size. The capacity component drops by more than a factor of 2, while the coherence

component nearly doubles. The increase in coherence misses occurs because the probability

of a miss being caused by an invalidation increases with cache size, since fewer entries are

bumped due to capacity. As we would expect, the increasing block size of the L1 data cache

substantially reduces the compulsory miss rate in the kernel references. It also has a signific-

ant impact on the capacity miss rate, decreasing it by a factor of 2.4 over the range of block

sizes. The increased block size has a small reduction in coherence traffic, which appears to

stabilize at 64 bytes, with no change in the coherence miss rate in going to 128-byte lines. Be-

cause there are no significant reductions in the coherence miss rate as the block size in-

creases, the fraction of the miss rate due to coherence grows from about 7% to about 15%.

If we examine the number of bytes needed per data reference, as in Figure 5.19 , we see that

the kernel has a higher traffic ratio that grows with block size. It is easy to see why this occurs:

When going from a 16-byte block to a 128-byte block, the miss rate drops by about 3.7, but the

number of bytes transferred per miss increases by 8, so the total miss traffic increases by just

over a factor of 2. The user program also more than doubles as the block size goes from 16 to

128 bytes, but it starts out at a much lower level.

Search WWH ::

Custom Search

Home