Hardware Reference
In-Depth Information
an error to dirty data and crash the program. Field engineers found no problems on inspection
in more than 90% of the cases.
To reduce the frequency of such errors, Sun modified the Solaris operating system to
“scrub” the cache by having a process that proactively writes dirty data to memory. Since the
processor chips did not have enough pins to add ECC, the only hardware option for dirty data
was to duplicate the external cache, using the copy without the parity error to correct the er-
ror.
The pitfall is in detecting faults without providing a mechanism to correct them. These en-
gineers are unlikely to design another computer without ECC on external caches.
1.12 Concluding Remarks
This chapter has introduced a number of concepts and provided a quantitative framework that
we will expand upon throughout the topic. Starting with this edition, energy eiciency is the
new companion to performance.
In Chapter 2 , we start with the all-important area of memory system design. We will exam-
ine a wide range of techniques that conspire to make memory look infinitely large while still
being as fast as possible. ( Appendix B provides introductory material on caches for readers
without much experience and background in them.) As in later chapters, we will see that hard-
ware-software cooperation has become a key to high-performance memory systems, just as it
has to high-performance pipelines. This chapter also covers virtual machines, an increasingly
important technique for protection.
In Chapter 3 , we look at instruction-level parallelism (ILP), of which pipelining is the
simplest and most common form. Exploiting ILP is one of the most important techniques for
building high-speed uniprocessors. Chapter 3 begins with an extensive discussion of basic
concepts that will prepare you for the wide range of ideas examined in both chapters. Chapter
3 uses examples that span about 40 years, drawing from one of the first supercomputers (IBM
360/91) to the fastest processors in the market in 2011. It emphasizes what is called the dynamic
or run time approach to exploiting ILP. It also talks about the limits to ILP ideas and introduces
multithreading, which is further developed in both Chapters 4 and 5 . Appendix C provides
introductory material on pipelining for readers without much experience and background in
pipelining. (We expect it to be a review for many readers, including those of our introductory
text, Computer Organization and Design: The Hardware/Software Interface .)
Chapter 4 is new to this edition, and it explains three ways to exploit data-level parallelism.
The classic and oldest approach is vector architecture, and we start there to lay down the prin-
ciples of SIMD design. (Appendix G goes into greater depth on vector architectures.) We next
explain the SIMD instruction set extensions found in most desktop microprocessors today.
The third piece is an in-depth explanation of how modern graphics processing units (GPUs)
work. Most GPU descriptions are writen from the programmer's perspective, which usually
hides how the computer really works. This section explains GPUs from an insider's perspect-
ive, including a mapping between GPU jargon and more traditional architecture terms.
Chapter 5 focuses on the issue of achieving higher performance using multiple processors,
for multiprocessors. Instead of using parallelism to overlap individual instructions, multi-
processing uses parallelism to allow multiple instruction streams to be executed simultan-
eously on different processors. Our focus is on the dominant form of multiprocessors, shared-
memory multiprocessors, though we introduce other types as well and discuss the broad is-
sues that arise in any multiprocessor. Here again, we explore a variety of techniques, focusing
on the important ideas first introduced in the 1980s and 1990s.
 
Search WWH ::




Custom Search