Hardware Reference
In-Depth Information
Note that each of the six optimizations above has a potential disadvantage that can lead to
increased, rather than decreased, average memory access time.
The rest of this chapter assumes familiarity with the material above and the details in Ap-
pendix B . In the Puting It All Together section, we examine the memory hierarchy for a mi-
croprocessor designed for a high-end server, the Intel Core i7, as well as one designed for use
in a PMD, the Arm Cortex-A8, which is the basis for the processor used in the Apple iPad and
several high-end smartphones. Within each of these classes, there is a significant diversity in
approach due to the intended use of the computer. While the high-end processor used in the
server has more cores and bigger caches than the Intel processors designed for desktop uses,
the processors have similar architectures. The differences are driven by performance and the
nature of the workload; desktop computers are primarily running one application at a time on
top of an operating system for a single user, whereas server computers may have hundreds
of users running potentially dozens of applications simultaneously. Because of these work-
load differences, desktop computers are generally concerned more with average latency from
the memory hierarchy, whereas server computers are also concerned about memory band-
width. Even within the class of desktop computers there is wide diversity from lower end net-
books with scaled-down processors more similar to those found in high-end PMDs, to high-
end desktops whose processors contain multiple cores and whose organization resembles that
of a low-end server.
In contrast, PMDs not only serve one user but generally also have smaller operating sys-
tems, usually less multitasking (running of several applications simultaneously), and simpler
applications. PMDs also typically use Flash memory rather than disks, and most consider both
performance and energy consumption, which determines batery life.
2.2 Ten Advanced Optimizations of Cache
Performance
The average memory access time formula above gives us three metrics for cache optimizations:
hit time, miss rate, and miss penalty. Given the recent trends, we add cache bandwidth and
power consumption to this list. We can classify the ten advanced cache optimizations we ex-
amine into five categories based on these metrics:
1. Reducing the hit time —Small and simple first-level caches and way-prediction. Both tech-
niques also generally decrease power consumption.
2. Increasing cache bandwidth —Pipelined caches, multibanked caches, and nonblocking caches.
These techniques have varying impacts on power consumption.
3. Reducing the miss penalty —Critical word first and merging write buffers. These optimiza-
tions have litle impact on power.
4. Reducing the miss rate —Compiler optimizations. Obviously any improvement at compile
time improves power consumption.
5. Reducing the miss penalty or miss rate via parallelism —Hardware prefetching and compiler
prefetching. These optimizations generally increase power consumption, primarily due to
prefetched data that are unused.
In general, the hardware complexity increases as we go through these optimizations. In addi-
tion, several of the optimizations require sophisticated compiler technology. We will conclude
with a summary of the implementation complexity and the performance benefits of the ten
techniques presented in Figure 2.11 on page 96. Since some of these are straightforward, we
cover them briefly; others require more description.
 
Search WWH ::




Custom Search