PARALLEL COMPUTER ARCHITECTURES - Structured Computer Organization

Hardware Reference

In-Depth Information

especially effective when jobs are submitted during the daytime for execution at

night, so the job scheduler has all the information about all the jobs in advance and

can run them in optimal order, as illustrated in Fig. 8-45(c).

That multicomputers scale to larger sizes than multiprocessors should be clear

from our examples. This reality has led to the development of message-passing

systems like MPI. Many programmers do not like this model and would like to

have the illusion of shared memory, even if it is not really there. Achieving this

goal would be the best of both worlds: large, inexpensive hardware (at least, per

node) plus ease of programming. This is the holy grail of parallel computing.

Many researchers have concluded that while shared memory at the architec-

tural level may not scale well, there may be other ways to achieve the same goal.

From Fig. 8-21, we see that there are other levels at which a shared memory can be

introduced. In the following sections, we will look at some ways that shared mem-

ory can be introduced into the programming model on a multicomputer, without it

being present at the hardware level.

Distributed Shared Memory

One class of application-level shared-memory system is the page-based sys-

tem. It goes under the name of DSM ( Distributed Shared Memory ). The idea is

simple: a collection of CPUs on a multicomputer share a common paged virtual

address space. In the simplest version, each page is held in the RAM of exactly

one CPU. In Fig. 8-46(a), we see a shared virtual address space consisting of 16

pages, spread over four CPUs.

When a CPU references a page in its own local RAM, the read or write just

happens without any further delay. However, when a CPU references a page in a

remote memory, it gets a page fault. Instead of having the missing page being

brought in from disk, though, the run-time system or operating system sends a

message to the node holding the page to unmap it and send it over. After it has ar-

rived, it is mapped in and the faulting instruction restarted, just as with a normal

page fault. In Fig. 8-46(b), we see the situation after CPU 0 has faulted on page

10: it is moved from CPU 1 to CPU 0.

This basic idea was first implemented in IVY (Li and Hudak, 1989). It pro-

vides a fully shared, sequentially consistent memory on a multicomputer. Howev-

er, many optimizations are possible to improve the performance. The first opti-

mization, present in IVY, is to allow pages that are marked as read-only to be pres-

ent at multiple nodes at the same time. Thus when a page fault occurs, a copy of

the page is sent to the faulting machine, but the original stays where it is since

there is no danger of conflicts. The situation of two CPUs sharing a read-only

page (page 10) is illustrated in Fig. 8-46(c).

Search WWH ::

Custom Search

Home