Hardware Reference
In-Depth Information
observed a reduction in the effective miss penalty of 20% for the SPECINT92 benchmarks and
30% for the SPECFP92 benchmarks when allowing one hit under miss.
Li, Chen, Brockman, and Jouppi [2011] recently updated this study to use a multilevel
cache, more modern assumptions about miss penalties, and the larger and more demanding
SPEC2006 benchmarks. The study was done assuming a model based on a single core of an
Intel i7 (see Section 2.6 ) running the SPEC2006 benchmarks. Figure 2.5 shows the reduction in
data cache access latency when allowing 1, 2, and 64 hits under a miss; the caption describes
further details of the memory system. The larger caches and the addition of an L3 cache since
the earlier study have reduced the benefits with the SPECINT2006 benchmarks showing an
average reduction in cache latency of about 9% and the SPECFP2006 benchmarks about 12.5%.
FIGURE 2.5 The effectiveness of a nonblocking cache is evaluated by allowing 1, 2, or
64 hits under a cache miss with 9 SPECINT (on the left) and 9 SPECFP (on the right)
benchmarks . The data memory system modeled after the Intel i7 consists of a 32KB L1
cache with a four cycle access latency. The L2 cache (shared with instructions) is 256 KB with
a 10 clock cycle access latency. The L3 is 2 MB and a 36-cycle access latency. All the caches
are eight-way set associative and have a 64-byte block size. Allowing one hit under miss re-
duces the miss penalty by 9% for the integer benchmarks and 12.5% for the floating point. Al-
lowing a second hit improves these results to 10% and 16%, and allowing 64 results in little
additional improvement.
Example
Which is more important for floating-point programs: two-way set associativity
for hit under one miss for the primary data caches? What about integer pro-
grams? Assume the following average miss rates for 32 KB data caches: 5.2% for
floating-point programs with a direct-mapped cache, 4.9% for these programs
with a two-way set associative cache, 3.5% for integer programs with a direct-
mapped cache, and 3.2% for integer programs with a two-way set associative
 
Search WWH ::




Custom Search