Hardware Reference
In-Depth Information
The remote request cost is
Hence, we can compute the CPI:
The multiprocessor with all local references is 1.7/0.5 = 3.4 times faster. In prac-
tice, the performance analysis is much more complex, since some fraction of the
noncommunication references will miss in the local hierarchy and the remote
access time does not have a single constant value. For example, the cost of a re-
mote reference could be quite a bit worse, since contention caused by many ref-
erences trying to use the global interconnect can lead to increased delays.
These problems—insufficient parallelism and long-latency remote communication—are the
two biggest performance challenges in using multiprocessors. The problem of inadequate ap-
plication parallelism must be atacked primarily in software with new algorithms that ofer
better parallel performance, as well as by software systems that maximize the amount of time
spent executing with the full complement of processors. Reducing the impact of long remote
latency can be atacked both by the architecture and by the programmer. For example, we
can reduce the frequency of remote accesses with either hardware mechanisms, such as cach-
ing shared data, or software mechanisms, such as restructuring the data to make more ac-
cesses local. We can try to tolerate the latency by using multithreading (discussed later in this
chapter) or by using prefetching (a topic we cover extensively in Chapter 2 ) .
Much of this chapter focuses on techniques for reducing the impact of long remote commu-
nication latency. For example, Sections 5.2 through 5.4 discuss how caching can be used to re-
duce remote access frequency, while maintaining a coherent view of memory. Section 5.5 dis-
cusses synchronization, which, because it inherently involves interprocessor communication
and also can limit parallelism, is a major potential botleneck. Section 5.6 covers latency-hid-
ing techniques and memory consistency models for shared memory. In Appendix I, we focus
primarily on larger-scale multiprocessors that are used predominantly for scientific work. In
Search WWH ::




Custom Search