Hardware Reference
In-Depth Information
Performance of the Cortex-A8 Memory Hierarchy
The memory hierarchy of the Cortex-A8 was simulated with 32 KB primary caches and
a 1 MB eight-way set associative L2 cache using the integer Minnespec benchmarks (see
KleinOsowski and Lilja [2002] ). Minnespec is a set of benchmarks consisting of the SPEC2000
benchmarks but with different inputs that reduce the running times by several orders of mag-
nitude. Although the use of smaller inputs does not change the instruction mix, it does affect
the cache behavior. For example, on mcf, the most memory-intensive SPEC2000 integer bench-
mark, Minnespec has a miss rate for a 32 KB cache that is only 65% of the miss rate for the full
SPEC version. For a 1 MB cache the difference is a factor of 6! On many other benchmarks the
ratios are similar to those on mcf, but the absolute miss rates are much smaller. For this reason,
one cannot compare the Minniespec benchmarks against the SPEC2000 benchmarks. Instead,
the data are useful for looking at the relative impact of L1 and L2 misses and on overall CPI,
as we do in the next chapter.
The instruction cache miss rates for these benchmarks (and also for the full SPEC2000 ver-
sions on which Minniespec is based) are very small even for just the L1: close to zero for most
and under 1% for all of them. This low rate probably results from the computationally intens-
ive nature of the SPEC programs and the four-way set associative cache that eliminates most
conlict misses. Figure 2.17 shows the data cache results, which have significant L1 and L2 miss
rates. The L1 miss penalty for a 1 GHz Cortex-A8 is 11 clock cycles, while the L2 miss penalty
is 60 clock cycles, using DDR SDRAMs for the main memory. Using these miss penalties, Fig-
ure 2.18 shows the average penalty per data access. In the next chapter, we will examine the
impact of the cache misses on overall CPI.
Search WWH ::




Custom Search