Figure 16-5. Hierarchical Design of the SGI Origin Series - ccNUMA - Packet-Switched Buses and ldstub - Figure 16-6. Packet-Switched Memory Bus Running ldstub - Multithreaded Programming with JAVA

send invalidations. The result is that the basic machine can be expanded well past the 16-board

limit of bus-based machines, at a cost of about 150 ns extra latency for each hop across the lattice.

The Origin is spec'd to expand out to 4096 CPUs. Now the only problem is writing programs that

can use 4096 CPUs.

Figure 16-5. Hierarchical Design of the SGI Origin Series

ccNUMA

Cache coherent nonuniform memory architecture is what the Origin does. The Origin clearly

supports a coherent cache via its elaborate scheme for directly cache invalidates. It also supports

nonuniform (speed) memory access, as on-board memory access is much faster than off-board

access. There are also strict bus-based systems that are CCNUMA. Sun's machines are not among

these, as they all define access memory to run at the same speed on-board and off-board.

Packet-Switched Buses and ldstub

There is one place we care about the bus design very directly (see Figure 16-6). Remember

ldstub, the mutex instruction? Well, the definition of ldstub says that it must perform its work

atomically. For a packet-switched bus, this means that it must retain bus ownership throughout the

entire operation, first fetching the byte in question, then writing all ones out to it. In other words,

using ldstub completely defeats the packet-switched nature of a packet-switched bus!

Figure 16-6. Packet-Switched Memory Bus Running ldstub

Search WWH :

Custom Search