Hardware Reference
In-Depth Information
FIGURE 2.2 Starting with 1980 performance as a baseline, the gap in performance,
measured as the difference in the time between processor memory requests (for a
single processor or core) and the latency of a DRAM access, is plotted over time . Note
that the vertical axis must be on a logarithmic scale to record the size of the processor-DRAM
performance gap. The memory baseline is 64 KB DRAM in 1980, with a 1.07 per year per-
formance improvement in latency (see Figure 2.13 on page 99). The processor line assumes
a 1.25 improvement per year until 1986, a 1.52 improvement until 2000, a 1.20 improvement
between 2000 and 2005, and no change in processor performance (on a per-core basis)
between 2005 and 2010; see Figure 1.1 in Chapter 1.
More recently, high-end processors have moved to multiple cores, further increasing the
bandwidth requirements versus single cores. In fact, the aggregate peak bandwidth essentially
grows as the numbers of cores grows. A modern high-end processor such as the Intel Core i7
can generate two data memory references per core each clock cycle; with four cores and a 3.2
GHz clock rate, the i7 can generate a peak of 25.6 billion 64-bit data memory references per
second, in addition to a peak instruction demand of about 12.8 billion 128-bit instruction ref-
erences; this is a total peak bandwidth of 409.6 GB/sec! This incredible bandwidth is achieved
by multiporting and pipelining the caches; by the use of multiple levels of caches, using separ-
ate first- and sometimes second-level caches per core; and by using a separate instruction and
data cache at the first level. In contrast, the peak bandwidth to DRAM main memory is only
6% of this (25 GB/sec).
Traditionally, designers of memory hierarchies focused on optimizing average memory ac-
cess time, which is determined by the cache access time, miss rate, and miss penalty. More re-
cently, however, power has become a major consideration. In high-end microprocessors, there
may be 10 MB or more of on-chip cache, and a large second- or third-level cache will consume
signiicant power both as leakage when not operating (called static power ) and as active power,
as when performing a read or write (called dynamic power ), as described in Section 2.3 . The
problem is even more acute in processors in PMDs where the CPU is less aggressive and the
power budget may be 20 to 50 times smaller. In such cases, the caches can account for 25% to
50% of the total power consumption. Thus, more designs must consider both performance and
power trade-offs, and we will examine both in this chapter.
 
Search WWH ::




Custom Search