Hardware Reference
In-Depth Information
generates one new process on each of machines 1 through n , running the program
foobar in each of them. As these n new processes (and the parent) execute in par-
allel, they can all push and pop items onto the shared stack s as though they were
all running on a shared-memory multiprocessor. It is the job of the run-time sys-
tem to sustain the illusion of shared memory where it really does not exist.
Operations on shared objects are atomic and sequentially consistent. The sys-
tem guarantees that if multiple processes perform operations on the same shared
object nearly simultaneously, the system chooses some order and all processes see
the same order of events.
Orca integrates shared data and synchronization in a way not present in page-
based DSM systems. Two kinds of synchronization are needed in parallel pro-
grams. The first kind is mutual-exclusion synchronization, to keep two processes
from executing the same critical region at the same time. In Orca, each operation
on a shared object is effectively like a critical region because the system guarantees
that the final result is the same as if all the critical regions were executed one at a
time (i.e., sequentially). In this respect, an Orca object is like a distributed form of
a monitor (Hoare, 1975).
The other kind of synchronization is condition synchronization, in which a
process blocks waiting for some condition to hold. In Orca, condition synchroni-
zation is done with guards. In the example of Fig. 8-48, a process trying to pop an
item from an empty stack will be suspended until the stack is no longer empty.
After all, you cannot pop a word from an empty stack.
The Orca run-time system handles object replication, migration, consistency,
and operation invocation. Each object can be in one of two states: single copy or
replicated. An object in single-copy state exists on only one machine, so all re-
quests for it are sent there. A replicated object is present on all machines con-
taining a process using it, which makes read operations easier (since they can be
done locally), at the expense of making updates more expensive. When an opera-
tion that modifies a replicated object is executed, it must first get a sequence num-
ber from a centralized process that issues them. Then a message is sent to each
machine holding a copy of the object, telling it to execute the operation. Since all
such updates bear sequence numbers, all machines just carry out the operations in
sequence order, which guarantees sequential consistency.
8.4.7 Performance
The point of building a parallel computer is to make it go faster than a uniproc-
essor machine. If it does not achieve that simple goal, it is not worth having. Fur-
thermore, it should achieve the goal in a cost-effective manner. A machine that is
twice as fast as a uniprocessor at 50 times the cost is not likely to be a big seller.
In this section we will examine some of the performance issues associated with
parallel computer architectures, starting with how you even measure it.
 
 
Search WWH ::




Custom Search