Information Technology Reference
In-Depth Information
receive only one message. In fact, some destination may be sent many messages, which can
result in their waiting a long time for receipt.
To develop an appreciation for the various approaches to this problem, we describe an
algorithm that distributes messages from sources to destinations, though not as efficiently as
possible. Each processor prepares a message to be sent to other processors. Processors not
accessing the common memory send messages containing dummy site addresses larger than any
other address. All messages are sorted by destination address cooperatively by the processors.
As seen in Theorem 7.7.1 , they can be sorted by a normal algorithm on an p -vertex hypercube,
p = 2 d ,in O (log 2 p ) steps using Batcher's bitonic sorting network described in Section 6.8.1 .
The k
p non-dummy messages are the first k messages in this sorted list. If the sites at
which these messages reside after sorting are the sites for which they were destined, the message
routing problem is solved. Unfortunately, this is generally not the case.
To route the messages from their positions in the sorted list to their destinations, we first
identify duplicates of destination addresses and compute D , the maximum number of dupli-
cates. We then route messages in D stages. In each stage at most one of the D duplicates
of each message is routed to its destination. To identify duplicates, we assign a processor to
each message in the sorted list that compares its destination site with that of its predecessor,
setting a flag bit to 0 if equal and to 1 otherwise. To compare destinations, move messages
to adjacent vertices on the hypercube, compare, and then reverse the process. (Move them by
sorting by appropriate addresses.) The first processor also sets its flag bit to 1. A segmented
integer addition prefix operation that segments its messages with these flag bits assigns to each
message an integer (a priority ) between 1 and D that is q if the site address of this message is
the q th such address. (Prefix computations can be done on a p -vertex hypercube in O (log p )
steps. See Problem 7.23 .) A message with priority q is routed to its destination in the q th stage.
An unsegmented prefix operation with max as the operator is then used to determine D .
In the q th stage, 1
D , all non-dummy messages with priority q are routed to their
destination site on the hypercube as follows:
a) one processor is assigned to each message;
q
b) each such processor computes the gap , the difference between the destination and current
site of its message;
c) each gap g is represented as a binary d -tuple g =( g d− 1 , ... , g 0 ) ;
d) For t = d
2, ... , 0, those messages whose gap contains 2 t are sent to the site
reached by crossing the t th dimension of the hypercube.
We show that in at most O ( D log p ) steps all messages are routed to their destinations.
Let the sorted message sites form an ascending sequence. If there are k non-dummy messages,
let gap i ,0
1, d
1, be the gap of the i th message. Observe that these gaps must also
form a nondecreasing sequence. For example, shown below is a sorted set of destinations and
a corresponding sequence of gaps:
i
k
gap i 11223 6 7 8
dest i 12457 1 3 5
i 01234 5 6 789 0 1 2 3 5
All the messages whose gaps contain 2 d− 1 must be the last messages in the sequence be-
cause the gaps would otherwise be out of order. Thus, advancing messages with these gaps by
 
Search WWH ::




Custom Search