PARALLEL COMPUTER ARCHITECTURES - Structured Computer Organization

Hardware Reference

In-Depth Information

Module field reads 001, representing the path it took. Since these requests do not

use any of the same switches, lines, or memory modules, they can go in parallel.

Now consider what would happen if CPU 000 simultaneously wanted to access

memory module 000. Its request would come into conflict with CPU 001's request

at switch 3A. One of them would have to wait. Unlike the crossbar switch, the

omega network is a blocking network . Not every set of requests can be processed

simultaneously. Conflicts can occur over the use of a wire or a switch, as well as

between requests to memory and replies from memory.

It is clearly desirable to spread the memory references uniformly across the

modules. One common technique is to use the low-order bits as the module num-

ber. Consider, for example, a byte-oriented address space for a computer that

mostly accesses 32-bit words. The 2 low-order bits will usually be 00, but the next

3 bits will be uniformly distributed. By using these 3 bits as the module number,

consecutively addressed words will be in consecutive modules. A memory system

in which consecutive words are in different modules is said to be interleaved .

Interleaved memories maximize parallelism because most memory references are

to consecutive addresses. It is also possible to design switching networks that are

nonblocking and that offer multiple paths from each CPU to each memory module,

to spread the traffic better.

It should be clear by now that single-bus UMA multiprocessors are generally

limited to no more than a few dozen CPUs and crossbar or switched multiproces-

sors need a lot of (expensive) hardware and are not that much bigger. To get to

more than 100 CPUs, something has to give. Usually, what gives is the idea that

all memory modules have the same access time. This concession leads to the idea

of NUMA ( NonUniform Memory Access ) multiprocessors. Like their UMA

cousins, they provide a single address space across all the CPUs, but unlike the

UMA machines, access to local memory modules is faster than access to remote

ones. Thus all UMA programs will run without change on NUMA machines, but

the performance will be worse than on a UMA machine at the same clock speed.

All NUMA machines have three key characteristics that together distinguish

them from other multiprocessors:

1. There is a single address space visible to all CPUs.

2. Access to remote memory done using LOAD and STORE instructions.

3. Access to remote memory is slower than access to local memory.

When the access time to remote memory is not hidden (because there is no cach-

ing), the system is called NC-NUMA . When coherent caches are present, the

Search WWH ::

Custom Search

Home