Hardware Reference
In-Depth Information
FIGURE 5.15 The number of misses per 1000 instructions drops steadily as the block
size of the L3 cache is increased, making a good case for an L3 block size of at least
128 bytes . The L3 cache is 2 MB, two-way set associative.
The lack of a significant effect on the instruction miss rate is startling. If there were an
instruction-only cache with this behavior, we would conclude that the spatial locality is very
poor. In the case of a mixed L2 cache, other effects such as instruction-data conflicts may also
contribute to the high instruction cache miss rate for larger blocks. Other studies have docu-
mented the low spatial locality in the instruction stream of large database and OLTP work-
loads, which have lots of short basic blocks and special-purpose code sequences. Based on
these data, the miss penalty for a larger block size L3 to perform as well as the 32-byte block
size L3 can be expressed as a multiplier on the 32-byte block size penalty:
Block size Miss penalty relative to 32-byte block miss penalty
64 bytes
1.19
128 bytes
1.36
256 bytes
1.52
With modern DDR SDRAMs that make block access fast, these numbers seem atainable, es-
pecially at the 128 byte block size. Of course, we must also worry about the effects of the in-
creased traffic to memory and possible contention for the memory with other cores. This latter
effect may easily negate the gains obtained from improving the performance of a single pro-
cessor.
 
Search WWH ::




Custom Search