Parallel Computation - Models of Computation: Exploring the Power of Computing

Information Technology Reference

In-Depth Information

receive only one message. In fact, some destination may be sent many messages, which can

result in their waiting a long time for receipt.

To develop an appreciation for the various approaches to this problem, we describe an

algorithm that distributes messages from sources to destinations, though not as efficiently as

possible. Each processor prepares a message to be sent to other processors. Processors not

accessing the common memory send messages containing dummy site addresses larger than any

other address. All messages are sorted by destination address cooperatively by the processors.

As seen in Theorem 7.7.1 , they can be sorted by a normal algorithm on an p -vertex hypercube,

p = 2 d ,in O (log 2 p ) steps using Batcher's bitonic sorting network described in Section 6.8.1 .

The k

p non-dummy messages are the first k messages in this sorted list. If the sites at

which these messages reside after sorting are the sites for which they were destined, the message

routing problem is solved. Unfortunately, this is generally not the case.

To route the messages from their positions in the sorted list to their destinations, we first

identify duplicates of destination addresses and compute D , the maximum number of dupli-

cates. We then route messages in D stages. In each stage at most one of the D duplicates

of each message is routed to its destination. To identify duplicates, we assign a processor to

each message in the sorted list that compares its destination site with that of its predecessor,

setting a flag bit to 0 if equal and to 1 otherwise. To compare destinations, move messages

to adjacent vertices on the hypercube, compare, and then reverse the process. (Move them by

sorting by appropriate addresses.) The first processor also sets its flag bit to 1. A segmented

integer addition prefix operation that segments its messages with these flag bits assigns to each

message an integer (a priority ) between 1 and D that is q if the site address of this message is

the q th such address. (Prefix computations can be done on a p -vertex hypercube in O (log p )

steps. See Problem 7.23 .) A message with priority q is routed to its destination in the q th stage.

An unsegmented prefix operation with max as the operator is then used to determine D .

In the q th stage, 1

≤

D , all non-dummy messages with priority q are routed to their

destination site on the hypercube as follows:

a) one processor is assigned to each message;

≤

b) each such processor computes the gap , the difference between the destination and current

site of its message;

c) each gap g is represented as a binary d -tuple g =( g d− 1 , ... , g 0 ) ;

d) For t = d

2, ... , 0, those messages whose gap contains 2 t are sent to the site

reached by crossing the t th dimension of the hypercube.

We show that in at most O ( D log p ) steps all messages are routed to their destinations.

Let the sorted message sites form an ascending sequence. If there are k non-dummy messages,

let gap i ,0

−

1, d

−

1, be the gap of the i th message. Observe that these gaps must also

form a nondecreasing sequence. For example, shown below is a sorted set of destinations and

a corresponding sequence of gaps:

≤

−

gap i 11223 6 7 8

dest i 12457 1 3 5

i 01234 5 6 789 0 1 2 3 5

All the messages whose gaps contain 2 d− 1 must be the last messages in the sequence be-

cause the gaps would otherwise be out of order. Thus, advancing messages with these gaps by

Models of Computation: Exploring the Power of Computing

Search WWH ::

Custom Search

Home