PARALLEL COMPUTER ARCHITECTURES - Structured Computer Organization

Hardware Reference

In-Depth Information

of release consistency (Amza, 1996). Potentially writable pages may be present at

multiple nodes at the same time, but before doing a write, a process must first do

an acquire operation to signal its intention. At that point, all copies but the most

recent one are invalidated. No other copies may be made until the corresponding

release is done, at which time the page can be shared again.

A second optimization done in Treadmarks is to initially map each writable

page in read-only mode. When the page is first written to, a protection fault occurs

and the system makes a copy of the page, called the twin . Then the original page

is mapped in as read-write and subsequent writes can go at full speed. When a re-

mote page fault happens later and the page has to be shipped over there, a word-

by-word comparison is done between the current page and the twin. Only those

words that have been changed are sent, reducing the size of the messages.

When a page fault occurs, the missing page has to be located. Various solu-

tions are possible, including those used in NUMA and COMA machines, such as

(home-based) directories. In fact, many of the solutions used in DSM are also

applicable to NUMA and COMA because DSM is really just a software imple-

mentation of NUMA or COMA with each page being treated like a cache line.

DSM is a hot area of research. Interesting systems include CASHMERE

(Kontothanassis, et al., 1997 and Stets et al., 1997), CRL (Johnson et al., 1995),

Shasta (Scales et al., 1996), and Treadmarks (Amza, 1996 and Lu et al., 1997).

Linda

Page-based DSM systems like IVY and Treadmarks use the MMU hardware to

trap accesses to missing pages. While making and sending differences instead of

whole pages helps, the fact remains that pages are an unnatural unit for sharing, so

other approaches have been tried.

One such approach is Linda, which provides processes on multiple machines

with a highly structured distributed shared memory (Carriero and Gelernter, 1989).

This memory is accessed through a small set of primitive operations that can be

added to existing languages, such as C and FORTRAN, to form parallel languages,

in this case, C-Linda and FORTRAN-Linda.

The unifying concept behind Linda is that of an abstract tuple space , which is

global to the entire system and accessible to all processes in it. Tuple space is like

a global shared memory, only with a certain built-in structure. The tuple space

contains some number of tuples , each consisting of one or more fields. For C-

Linda, field types include integers, long integers, and floating-point numbers, as

well as composite types such as arrays (including strings) and structures (but not

other tuples). Figure 8-47 shows three tuples as examples.

Four operations are provided on tuples. The first one, out , puts a tuple into the

tuple space. For example,

out(

′′

abc

′′

,2,5);

Structured Computer Organization

Search WWH ::

Custom Search

Home