Memory Hierarchy Design - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

FIGURE 2.2 Starting with 1980 performance as a baseline, the gap in performance,

measured as the difference in the time between processor memory requests (for a

single processor or core) and the latency of a DRAM access, is plotted over time . Note

that the vertical axis must be on a logarithmic scale to record the size of the processor-DRAM

performance gap. The memory baseline is 64 KB DRAM in 1980, with a 1.07 per year per-

formance improvement in latency (see Figure 2.13 on page 99). The processor line assumes

a 1.25 improvement per year until 1986, a 1.52 improvement until 2000, a 1.20 improvement

between 2000 and 2005, and no change in processor performance (on a per-core basis)

between 2005 and 2010; see Figure 1.1 in Chapter 1.

More recently, high-end processors have moved to multiple cores, further increasing the

bandwidth requirements versus single cores. In fact, the aggregate peak bandwidth essentially

grows as the numbers of cores grows. A modern high-end processor such as the Intel Core i7

can generate two data memory references per core each clock cycle; with four cores and a 3.2

GHz clock rate, the i7 can generate a peak of 25.6 billion 64-bit data memory references per

second, in addition to a peak instruction demand of about 12.8 billion 128-bit instruction ref-

erences; this is a total peak bandwidth of 409.6 GB/sec! This incredible bandwidth is achieved

by multiporting and pipelining the caches; by the use of multiple levels of caches, using separ-

ate first- and sometimes second-level caches per core; and by using a separate instruction and

data cache at the first level. In contrast, the peak bandwidth to DRAM main memory is only

6% of this (25 GB/sec).

Traditionally, designers of memory hierarchies focused on optimizing average memory ac-

cess time, which is determined by the cache access time, miss rate, and miss penalty. More re-

cently, however, power has become a major consideration. In high-end microprocessors, there

may be 10 MB or more of on-chip cache, and a large second- or third-level cache will consume

signiicant power both as leakage when not operating (called static power ) and as active power,

as when performing a read or write (called dynamic power ), as described in Section 2.3 . The

problem is even more acute in processors in PMDs where the CPU is less aggressive and the

power budget may be 20 to 50 times smaller. In such cases, the caches can account for 25% to

50% of the total power consumption. Thus, more designs must consider both performance and

power trade-offs, and we will examine both in this chapter.

Search WWH ::

Custom Search

Home