Hardware Reference
In-Depth Information
dividing up the data and placing them in the optimal locations is a major issue on a
multicomputer. It is less of an issue on a multiprocessor since placement does not
affect correctness or programmability although it may affect performance. In
short, programming a multicomputer is much more difficult than programming a
multiprocessor.
Under these conditions, why would anyone build multicomputers, when multi-
processors are easier to program? The answer is simple: large multicomputers are
much simpler and cheaper to build than multiprocessors with the same number of
CPUs. Implementing a memory shared by even a few hundred CPUs is a substan-
tial undertaking, whereas building a multicomputer with 10,000 CPUs or more is
straightforward. Later in this chapter we will study a multicomputer with over
50,000 CPUs.
Thus we have a dilemma: multiprocessors are hard to build but easy to pro-
gram whereas multicomputers are easy to build but hard to program. This observa-
tion has led to a great deal of effort to construct hybrid systems that are relatively
easy to build and relatively easy to program. This work has led to the realization
that shared memory can be implemented in various ways, each with its own set of
advantages and disadvantages. In fact, much research in parallel architectures
these days relates to the convergence of multiprocessor and multicomputer archi-
tectures into hybrid forms that combine the strengths of each. The holy grail here
is to find designs that are scalable , that is, continue to perform well as more and
more CPUs are added.
One approach to building hybrid systems is based on the fact that modern com-
puter systems are not monolithic but are constructed as a series of layers—the
theme of this topic. This insight opens the possibility of implementing the shared
memory at any one of several layers, as shown in Fig. 8-21. In Fig. 8-21(a) we see
the shared memory being implemented by the hardware as a true multiprocessor.
In this design, there is a single copy of the operating system with a single set of
tables, in particular, the memory allocation table. When a process needs more
memory, it traps to the operating system, which then looks in its table for a free
page and maps the page into the called's address space. As far as the operating
system is concerned, there is a single memory and it keeps track of which process
owns which page in software. There are many ways to implement hardware shared
memory, as we will see later.
A second possibility is to use multicomputer hardware and have the operating
system simulate shared memory by providing a single system-wide paged shared
virtual address space. In this approach, called DSM ( Distributed Shared Mem-
ory ) (Li and Hudak, 1989), each page is located in one of the memories of
Fig. 8-20(a). Each machine has its own virtual memory and its own page tables.
When a CPU does a LOAD or STORE on a page it does not have, a trap to the oper-
ating system occurs. The operating system then locates the page and asks the CPU
currently holding it in its memory to unmap the page and send it over the intercon-
nection network. When it finally arrives, the page is mapped in and the faulting
 
Search WWH ::




Custom Search