Information Technology Reference
In-Depth Information
a j . The contents of the locations M 1 , M 2 , ... , M p are sorted, which creates a subsequence
in which pairs with a common address occur together and within which the pairs are sorted
by processor numbers. From Theorem 7.7.1 it follows that this step can be performed in
time O (log 2 p ) by a normal algorithm. So far no concurrent reads or writes occur.
A processor is now assigned to each pair in the sorted sequence. We consider two cases:
a) processors are reading from or b) writing to common memory. Each processor now
compares the address of its pair to that of the preceding pair. If a processor finds these
addresses to be different and case a holds, it reads the item in common memory and sets a
flag bit to 1; all other processors except the first set their flag bits to 0; the first sets its bit to 1.
(This bit is used later to distribute the value that was read.) However, if case b holds instead,
the processor writes its value. Since this processor has the lowest index of all processors and
the priority CRCW is the strongest model, the value written is the same value written by
either the common or arbitrary CRCW models.
Returning now to case a, the flag bits mark the first pair in each subsequence of pairs
that have the same address in the common memory. Associated with the leading pair is the
value read at this address. We now perform a segmented prefix computation using as the
associative rule the copy-right operation. (See Problem 2.20 .) It distributes to each pair
( a j , j ) the value the processor wished to read from the common memory. By Problem 2.21
this problem can be solved by a p -processor EREW PRAM in O (log p ) steps. The pairs
and their accompanying value are then sorted by the processor number so that the value
read from the common memory is in a location reserved for the processor that requested the
value.
7.9.3 Simulating the PRAM on a Hypercube Network
As stated above, each PRAM cycle involves reading from the global memory, performing a
local computation, and writing to the common memory. Of course, a processor need not
access common memory when given the chance. Thus, to simulate a PRAM on a network
computer, one has to take into account the fact that not all PRAM processors necessarily read
from or write to common memory locations on each cycle.
It is important to remember that the latency of network computers can be large. Thus, for
the simulation described below to be useful, each PRAM processor must be able to do a lot of
work between network accesses.
The EREW PRAM is simulated on a network computer by executing three phases, two of
which correspond to reading and writing common memory. (To simulate the CRCW PRAM,
we need only add the time given above to simulate a CRCW PRAM by an EREW PRAM.)
We simulate an access to common memory by routing a message over the network to the site
containing the simulated common memory location. It follows that a message must contain
the name of a site as well as the address of a memory location at that site. If the simulated
access is a memory read, a return message is generated containing the value of the memory
location. If it is a memory write, the transmitted message must also contain the datum to write
into the memory location. We assume that the sites are numbered consecutively from 1 to p ,
the number of processors.
The first problem to be solved is the routing of messages from source to destination pro-
cessors. This routing problem was partially addressed in Section 7.8 . The new wrinkle here is
that the mapping from source to destination sites defined by a set of messages is not necessarily
a permutation. Not all sources may send a message and not all destinations are guaranteed to
Search WWH ::




Custom Search