Parallel Computation - Models of Computation: Exploring the Power of Computing

Information Technology Reference

In-Depth Information

a j . The contents of the locations M 1 , M 2 , ... , M p are sorted, which creates a subsequence

in which pairs with a common address occur together and within which the pairs are sorted

by processor numbers. From Theorem 7.7.1 it follows that this step can be performed in

time O (log 2 p ) by a normal algorithm. So far no concurrent reads or writes occur.

A processor is now assigned to each pair in the sorted sequence. We consider two cases:

a) processors are reading from or b) writing to common memory. Each processor now

compares the address of its pair to that of the preceding pair. If a processor finds these

addresses to be different and case a holds, it reads the item in common memory and sets a

flag bit to 1; all other processors except the first set their flag bits to 0; the first sets its bit to 1.

(This bit is used later to distribute the value that was read.) However, if case b holds instead,

the processor writes its value. Since this processor has the lowest index of all processors and

the priority CRCW is the strongest model, the value written is the same value written by

either the common or arbitrary CRCW models.

Returning now to case a, the flag bits mark the first pair in each subsequence of pairs

that have the same address in the common memory. Associated with the leading pair is the

value read at this address. We now perform a segmented prefix computation using as the

associative rule the copy-right operation. (See Problem 2.20 .) It distributes to each pair

( a j , j ) the value the processor wished to read from the common memory. By Problem 2.21

this problem can be solved by a p -processor EREW PRAM in O (log p ) steps. The pairs

and their accompanying value are then sorted by the processor number so that the value

read from the common memory is in a location reserved for the processor that requested the

value.

7.9.3 Simulating the PRAM on a Hypercube Network

As stated above, each PRAM cycle involves reading from the global memory, performing a

local computation, and writing to the common memory. Of course, a processor need not

access common memory when given the chance. Thus, to simulate a PRAM on a network

computer, one has to take into account the fact that not all PRAM processors necessarily read

from or write to common memory locations on each cycle.

It is important to remember that the latency of network computers can be large. Thus, for

the simulation described below to be useful, each PRAM processor must be able to do a lot of

work between network accesses.

The EREW PRAM is simulated on a network computer by executing three phases, two of

which correspond to reading and writing common memory. (To simulate the CRCW PRAM,

we need only add the time given above to simulate a CRCW PRAM by an EREW PRAM.)

We simulate an access to common memory by routing a message over the network to the site

containing the simulated common memory location. It follows that a message must contain

the name of a site as well as the address of a memory location at that site. If the simulated

access is a memory read, a return message is generated containing the value of the memory

location. If it is a memory write, the transmitted message must also contain the datum to write

into the memory location. We assume that the sites are numbered consecutively from 1 to p ,

the number of processors.

The first problem to be solved is the routing of messages from source to destination pro-

cessors. This routing problem was partially addressed in Section 7.8 . The new wrinkle here is

that the mapping from source to destination sites defined by a set of messages is not necessarily

a permutation. Not all sources may send a message and not all destinations are guaranteed to

Models of Computation: Exploring the Power of Computing

Search WWH ::

Custom Search

Home