send invalidations. The result is that the basic machine can be expanded well past the 16-board
limit of bus-based machines, at a cost of about 150 ns extra latency for each hop across the lattice.
The Origin is spec'd to expand out to 4096 CPUs. Now the only problem is writing programs that
can use 4096 CPUs.
Figure 16-5. Hierarchical Design of the SGI Origin Series
Cache coherent nonuniform memory architecture is what the Origin does. The Origin clearly
supports a coherent cache via its elaborate scheme for directly cache invalidates. It also supports
nonuniform (speed) memory access, as on-board memory access is much faster than off-board
access. There are also strict bus-based systems that are CCNUMA. Sun's machines are not among
these, as they all define access memory to run at the same speed on-board and off-board.
Packet-Switched Buses and ldstub
There is one place we care about the bus design very directly (see Figure 16-6). Remember
ldstub, the mutex instruction? Well, the definition of ldstub says that it must perform its work
atomically. For a packet-switched bus, this means that it must retain bus ownership throughout the
entire operation, first fetching the byte in question, then writing all ones out to it. In other words,
using ldstub completely defeats the packet-switched nature of a packet-switched bus!
Figure 16-6. Packet-Switched Memory Bus Running ldstub
Search WWH :