Chapter 16. Hardware
Types of Multiprocessors
In which we look at the various designs for SMP machines (cache architectures, interconnect
topologies, atomic instructions, invalidation techniques) and consider how those designs affect our
programming decisions. Some optimization possibilities are looked at.
Types of Multiprocessors
In dealing with MT as we have described it here, we are also making some assumptions about the
hardware we are going to be using. Everything we discussed is based on our using shared memory
symmetric multiprocessor (SMP) machines. There are several other types of multiprocessor
machines, such as distributed shared memory multiprocessors (Cray T3D, etc.) and massively
parallel multiprocessors (CM-1, etc.), but these require very different programming techniques.
Shared Memory Symmetric Multiprocessors
The fundamental design of this machine requires that all processors see all of main memory in an
identical fashion. Even though a memory bank might be physically closer to one CPU than
another, there is no programming-level distinction in how that memory is accessed. (Hardware
designers can do all sorts of clever things to optimize memory access behind our backs, as long as
we are never aware of them.)
The other distinguishing aspect of this machine is that all CPUs have full access to all resources
(kernel, disks, networks, interrupts, etc.) and are treated as peers by the operating system. Any
CPU can run kernel code at any time (respecting locked regions, of course) to do anything. Any
CPU can write out to any disk, network device, etc., at any time. Hardware interrupts may be
delivered to any CPU, although this is a weaker requirement and is not always followed.
In practice, interrupts are generally distributed to CPUs in a round-robin fashion.
All of the multiprocessors in the PC, workstation, and server realms are shared memory symmetric
multiprocessors: the two-way Compaq machines and all of the Sun, SGI, HP, DEC, HAL, and
IBM RISC machines. (IBM also builds the SP-2, a large, distributed memory machine--basically,
a cluster of PowerServers.) Obviously, all manufacturers have their own internal designs and
optimizations, but for our purposes, they have essentially the same architecture.
All of the CPUs have the same basic design. There's the CPU proper (registers, instruction set,
fetch, decode, execution units, etc.), and there's the interface to the memory system. Two
components of the memory interface are of particular interest to us. First there's an internal cache
there's a store buffer. The I$ holds all of the most recently accessed words and provides single-
cycle access for the CPU. Should the I$ in CPU 0 contain a word that CPU 1 changes, there has to
be some way for CPU 0 to beware of this change. E$ access is about 5 cycles, with the same
coherency issue. This is problem 1.
Search WWH :