PARALLEL COMPUTER ARCHITECTURES - Structured Computer Organization

Hardware Reference

In-Depth Information

dividing up the data and placing them in the optimal locations is a major issue on a

multicomputer. It is less of an issue on a multiprocessor since placement does not

affect correctness or programmability although it may affect performance. In

short, programming a multicomputer is much more difficult than programming a

multiprocessor.

Under these conditions, why would anyone build multicomputers, when multi-

processors are easier to program? The answer is simple: large multicomputers are

much simpler and cheaper to build than multiprocessors with the same number of

CPUs. Implementing a memory shared by even a few hundred CPUs is a substan-

tial undertaking, whereas building a multicomputer with 10,000 CPUs or more is

straightforward. Later in this chapter we will study a multicomputer with over

50,000 CPUs.

Thus we have a dilemma: multiprocessors are hard to build but easy to pro-

gram whereas multicomputers are easy to build but hard to program. This observa-

tion has led to a great deal of effort to construct hybrid systems that are relatively

easy to build and relatively easy to program. This work has led to the realization

that shared memory can be implemented in various ways, each with its own set of

advantages and disadvantages. In fact, much research in parallel architectures

these days relates to the convergence of multiprocessor and multicomputer archi-

tectures into hybrid forms that combine the strengths of each. The holy grail here

is to find designs that are scalable , that is, continue to perform well as more and

more CPUs are added.

One approach to building hybrid systems is based on the fact that modern com-

puter systems are not monolithic but are constructed as a series of layers—the

theme of this topic. This insight opens the possibility of implementing the shared

memory at any one of several layers, as shown in Fig. 8-21. In Fig. 8-21(a) we see

the shared memory being implemented by the hardware as a true multiprocessor.

In this design, there is a single copy of the operating system with a single set of

tables, in particular, the memory allocation table. When a process needs more

memory, it traps to the operating system, which then looks in its table for a free

page and maps the page into the called's address space. As far as the operating

system is concerned, there is a single memory and it keeps track of which process

owns which page in software. There are many ways to implement hardware shared

memory, as we will see later.

A second possibility is to use multicomputer hardware and have the operating

system simulate shared memory by providing a single system-wide paged shared

virtual address space. In this approach, called DSM ( Distributed Shared Mem-

ory ) (Li and Hudak, 1989), each page is located in one of the memories of

Fig. 8-20(a). Each machine has its own virtual memory and its own page tables.

When a CPU does a LOAD or STORE on a page it does not have, a trap to the oper-

ating system occurs. The operating system then locates the page and asks the CPU

currently holding it in its memory to unmap the page and send it over the intercon-

nection network. When it finally arrives, the page is mapped in and the faulting

Structured Computer Organization

Search WWH ::

Custom Search

Home