img
. . .
elements on one axis want to communicate with the same element on the second. Crossbar
switches are much faster than buses--and more expensive.
The practical limit on crossbar switches right now (1999) seems to be about 4 x 4 (Figure 16-4),
the size of both the Sun and SGI designs. To build machines larger than four CPUs, some
additional interconnect is required. On the larger Sun Ultra machines, a centerplane bus is used
that can accommodate up to 16 quad CPU boards. On the larger SGI machines, an entirely
different approach is used.
Figure 16-4. Cluster Using a Crossbar Switch
Hierarchical Interconnects
The practical (and legal[8]) limit to bus length is approximately 16 boards. Beyond that you have
horrendous problems with signal propagation. The "obvious" solution to this limit is to build a
hierarchical machine with clusters of buses communicating with other clusters of buses, ad
infinitum. In its simplest form, this is no big deal. Want some more CPUs? Just add a new cluster!
Sure, you'll see longer communication latencies as you access more distant clusters, but that's just
the way things are.
[8]
186,000 miles/second. It's not just a good idea, it's the law!
There is one aspect of SMP design that makes a mess of this simple model--cache memory. We
need to use caches to avoid saturating the interconnect, but at the same time caches need to be
kept coherent, and that's tricky. If the cache for CPU 169 contains an entry for address
x31415926, and CPU 0 writes into that address, how is cache 169 going to get invalidated?
Propagating every invalidate across the entire interconnect would saturate it quickly. The object
now becomes finding a method to propagate invalidations only to those caches that need them.
Built along the designs of Stanford's DASH project, the SGI Origin (Figure 16-5) uses a small
crossbar for its clusters and an expandable, hierarchical lattice instead of a bus. Embedded in each
cluster is an invalidation directory, which keeps track of which other clusters have cached copies
of its local memory. When main memory is written to, the directory knows to which clusters to
Search WWH :
Custom Search
Previous Page
Multithreaded Programming with JAVA - Topic Index
Next Page
Multithreaded Programming with JAVA - Bookmarks
Home