Hardware Reference
In-Depth Information
especially effective when jobs are submitted during the daytime for execution at
night, so the job scheduler has all the information about all the jobs in advance and
can run them in optimal order, as illustrated in Fig. 8-45(c).
8.4.6 Application-Level Shared Memory
That multicomputers scale to larger sizes than multiprocessors should be clear
from our examples. This reality has led to the development of message-passing
systems like MPI. Many programmers do not like this model and would like to
have the illusion of shared memory, even if it is not really there. Achieving this
goal would be the best of both worlds: large, inexpensive hardware (at least, per
node) plus ease of programming. This is the holy grail of parallel computing.
Many researchers have concluded that while shared memory at the architec-
tural level may not scale well, there may be other ways to achieve the same goal.
From Fig. 8-21, we see that there are other levels at which a shared memory can be
introduced. In the following sections, we will look at some ways that shared mem-
ory can be introduced into the programming model on a multicomputer, without it
being present at the hardware level.
Distributed Shared Memory
One class of application-level shared-memory system is the page-based sys-
tem. It goes under the name of DSM ( Distributed Shared Memory ). The idea is
simple: a collection of CPUs on a multicomputer share a common paged virtual
address space. In the simplest version, each page is held in the RAM of exactly
one CPU. In Fig. 8-46(a), we see a shared virtual address space consisting of 16
pages, spread over four CPUs.
When a CPU references a page in its own local RAM, the read or write just
happens without any further delay. However, when a CPU references a page in a
remote memory, it gets a page fault. Instead of having the missing page being
brought in from disk, though, the run-time system or operating system sends a
message to the node holding the page to unmap it and send it over. After it has ar-
rived, it is mapped in and the faulting instruction restarted, just as with a normal
page fault. In Fig. 8-46(b), we see the situation after CPU 0 has faulted on page
10: it is moved from CPU 1 to CPU 0.
This basic idea was first implemented in IVY (Li and Hudak, 1989). It pro-
vides a fully shared, sequentially consistent memory on a multicomputer. Howev-
er, many optimizations are possible to improve the performance. The first opti-
mization, present in IVY, is to allow pages that are marked as read-only to be pres-
ent at multiple nodes at the same time. Thus when a page fault occurs, a copy of
the page is sent to the faulting machine, but the original stays where it is since
there is no danger of conflicts. The situation of two CPUs sharing a read-only
page (page 10) is illustrated in Fig. 8-46(c).
 
 
Search WWH ::




Custom Search