Chapter 16. Hardware - Types of Multiprocessors - Shared Memory Symmetric Multiprocessors - The CPU - Multithreaded Programming with JAVA

Chapter 16. Hardware

Types of Multiprocessors

Bus Architectures

Memory Systems

In which we look at the various designs for SMP machines (cache architectures, interconnect

topologies, atomic instructions, invalidation techniques) and consider how those designs affect our

programming decisions. Some optimization possibilities are looked at.

Types of Multiprocessors

In dealing with MT as we have described it here, we are also making some assumptions about the

hardware we are going to be using. Everything we discussed is based on our using shared memory

symmetric multiprocessor (SMP) machines. There are several other types of multiprocessor

machines, such as distributed shared memory multiprocessors (Cray T3D, etc.) and massively

parallel multiprocessors (CM-1, etc.), but these require very different programming techniques.

Shared Memory Symmetric Multiprocessors

The fundamental design of this machine requires that all processors see all of main memory in an

identical fashion. Even though a memory bank might be physically closer to one CPU than

another, there is no programming-level distinction in how that memory is accessed. (Hardware

designers can do all sorts of clever things to optimize memory access behind our backs, as long as

we are never aware of them.)

The other distinguishing aspect of this machine is that all CPUs have full access to all resources

(kernel, disks, networks, interrupts, etc.) and are treated as peers by the operating system. Any

CPU can run kernel code at any time (respecting locked regions, of course) to do anything. Any

CPU can write out to any disk, network device, etc., at any time. Hardware interrupts may be

delivered to any CPU, although this is a weaker requirement and is not always followed.[1]

[1]

In practice, interrupts are generally distributed to CPUs in a round-robin fashion.

All of the multiprocessors in the PC, workstation, and server realms are shared memory symmetric

multiprocessors: the two-way Compaq machines and all of the Sun, SGI, HP, DEC, HAL, and

IBM RISC machines. (IBM also builds the SP-2, a large, distributed memory machine--basically,

a cluster of PowerServers.) Obviously, all manufacturers have their own internal designs and

optimizations, but for our purposes, they have essentially the same architecture.

The CPU

All of the CPUs have the same basic design. There's the CPU proper (registers, instruction set,

fetch, decode, execution units, etc.), and there's the interface to the memory system. Two

components of the memory interface are of particular interest to us. First there's an internal cache

(I$[2]--typically 2032 kB), then an external cache (E$--typically, 0.516 MB),[3] and finally,

there's a store buffer. The I$ holds all of the most recently accessed words and provides single-

cycle access for the CPU. Should the I$ in CPU 0 contain a word that CPU 1 changes, there has to

be some way for CPU 0 to beware of this change. E$ access is about 5 cycles, with the same

coherency issue. This is problem 1.

Search WWH :

Custom Search