Memory Hierarchy Design - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Note that each of the six optimizations above has a potential disadvantage that can lead to

increased, rather than decreased, average memory access time.

The rest of this chapter assumes familiarity with the material above and the details in Ap-

pendix B . In the Puting It All Together section, we examine the memory hierarchy for a mi-

croprocessor designed for a high-end server, the Intel Core i7, as well as one designed for use

in a PMD, the Arm Cortex-A8, which is the basis for the processor used in the Apple iPad and

several high-end smartphones. Within each of these classes, there is a significant diversity in

approach due to the intended use of the computer. While the high-end processor used in the

server has more cores and bigger caches than the Intel processors designed for desktop uses,

the processors have similar architectures. The differences are driven by performance and the

nature of the workload; desktop computers are primarily running one application at a time on

top of an operating system for a single user, whereas server computers may have hundreds

of users running potentially dozens of applications simultaneously. Because of these work-

load differences, desktop computers are generally concerned more with average latency from

the memory hierarchy, whereas server computers are also concerned about memory band-

width. Even within the class of desktop computers there is wide diversity from lower end net-

books with scaled-down processors more similar to those found in high-end PMDs, to high-

end desktops whose processors contain multiple cores and whose organization resembles that

of a low-end server.

In contrast, PMDs not only serve one user but generally also have smaller operating sys-

tems, usually less multitasking (running of several applications simultaneously), and simpler

applications. PMDs also typically use Flash memory rather than disks, and most consider both

performance and energy consumption, which determines batery life.

2.2 Ten Advanced Optimizations of Cache

Performance

The average memory access time formula above gives us three metrics for cache optimizations:

hit time, miss rate, and miss penalty. Given the recent trends, we add cache bandwidth and

power consumption to this list. We can classify the ten advanced cache optimizations we ex-

amine into five categories based on these metrics:

1. Reducing the hit time —Small and simple first-level caches and way-prediction. Both tech-

niques also generally decrease power consumption.

2. Increasing cache bandwidth —Pipelined caches, multibanked caches, and nonblocking caches.

These techniques have varying impacts on power consumption.

3. Reducing the miss penalty —Critical word first and merging write buffers. These optimiza-

tions have litle impact on power.

4. Reducing the miss rate —Compiler optimizations. Obviously any improvement at compile

time improves power consumption.

5. Reducing the miss penalty or miss rate via parallelism —Hardware prefetching and compiler

prefetching. These optimizations generally increase power consumption, primarily due to

prefetched data that are unused.

In general, the hardware complexity increases as we go through these optimizations. In addi-

tion, several of the optimizations require sophisticated compiler technology. We will conclude

with a summary of the implementation complexity and the performance benefits of the ten

techniques presented in Figure 2.11 on page 96. Since some of these are straightforward, we

cover them briefly; others require more description.

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home