Hardware Reference
In-Depth Information
Module field reads 001, representing the path it took. Since these requests do not
use any of the same switches, lines, or memory modules, they can go in parallel.
Now consider what would happen if CPU 000 simultaneously wanted to access
memory module 000. Its request would come into conflict with CPU 001's request
at switch 3A. One of them would have to wait. Unlike the crossbar switch, the
omega network is a blocking network . Not every set of requests can be processed
simultaneously. Conflicts can occur over the use of a wire or a switch, as well as
between requests to memory and replies from memory.
It is clearly desirable to spread the memory references uniformly across the
modules. One common technique is to use the low-order bits as the module num-
ber. Consider, for example, a byte-oriented address space for a computer that
mostly accesses 32-bit words. The 2 low-order bits will usually be 00, but the next
3 bits will be uniformly distributed. By using these 3 bits as the module number,
consecutively addressed words will be in consecutive modules. A memory system
in which consecutive words are in different modules is said to be interleaved .
Interleaved memories maximize parallelism because most memory references are
to consecutive addresses. It is also possible to design switching networks that are
nonblocking and that offer multiple paths from each CPU to each memory module,
to spread the traffic better.
8.3.4 NUMA Multiprocessors
It should be clear by now that single-bus UMA multiprocessors are generally
limited to no more than a few dozen CPUs and crossbar or switched multiproces-
sors need a lot of (expensive) hardware and are not that much bigger. To get to
more than 100 CPUs, something has to give. Usually, what gives is the idea that
all memory modules have the same access time. This concession leads to the idea
of NUMA ( NonUniform Memory Access ) multiprocessors. Like their UMA
cousins, they provide a single address space across all the CPUs, but unlike the
UMA machines, access to local memory modules is faster than access to remote
ones. Thus all UMA programs will run without change on NUMA machines, but
the performance will be worse than on a UMA machine at the same clock speed.
All NUMA machines have three key characteristics that together distinguish
them from other multiprocessors:
1. There is a single address space visible to all CPUs.
2. Access to remote memory done using LOAD and STORE instructions.
3. Access to remote memory is slower than access to local memory.
When the access time to remote memory is not hidden (because there is no cach-
ing), the system is called NC-NUMA . When coherent caches are present, the
 
 
Search WWH ::




Custom Search