Figure 16-4. Cluster Using a Crossbar Switch - Hierarchical Interconnects - Multithreaded Programming with JAVA

elements on one axis want to communicate with the same element on the second. Crossbar

switches are much faster than buses--and more expensive.

The practical limit on crossbar switches right now (1999) seems to be about 4 x 4 (Figure 16-4),

the size of both the Sun and SGI designs. To build machines larger than four CPUs, some

additional interconnect is required. On the larger Sun Ultra machines, a centerplane bus is used

that can accommodate up to 16 quad CPU boards. On the larger SGI machines, an entirely

different approach is used.

Figure 16-4. Cluster Using a Crossbar Switch

Hierarchical Interconnects

The practical (and legal[8]) limit to bus length is approximately 16 boards. Beyond that you have

horrendous problems with signal propagation. The "obvious" solution to this limit is to build a

hierarchical machine with clusters of buses communicating with other clusters of buses, ad

infinitum. In its simplest form, this is no big deal. Want some more CPUs? Just add a new cluster!

Sure, you'll see longer communication latencies as you access more distant clusters, but that's just

the way things are.

[8]

186,000 miles/second. It's not just a good idea, it's the law!

There is one aspect of SMP design that makes a mess of this simple model--cache memory. We

need to use caches to avoid saturating the interconnect, but at the same time caches need to be

kept coherent, and that's tricky. If the cache for CPU 169 contains an entry for address

x31415926, and CPU 0 writes into that address, how is cache 169 going to get invalidated?

Propagating every invalidate across the entire interconnect would saturate it quickly. The object

now becomes finding a method to propagate invalidations only to those caches that need them.

Built along the designs of Stanford's DASH project, the SGI Origin (Figure 16-5) uses a small

crossbar for its clusters and an expandable, hierarchical lattice instead of a bus. Embedded in each

cluster is an invalidation directory, which keeps track of which other clusters have cached copies

of its local memory. When main memory is written to, the directory knows to which clusters to

Search WWH :

Custom Search