Information Technology Reference
In-Depth Information
numbers that agree in their d least significant positions. To simulate a normal hypercube algo-
rithm on the 2D mesh, in each row simulate a normal hypercube algorithm on 2 d vertices after
which in each column simulate a normal hypercube algorithm on 2 d vertices. The correctness
of this procedure follows because every adjacent pair of vertices of the simulated hypercube is
at some time located in adjacent cells of the 2D array.
From Theorem 7.7.4 it follows that hypercube exchanges across the lower d dimensions
can be simulated in time proportional to the length of a row, that is, in time O ( n ) . Similarly,
it also follows that hyperc ub e exchanges across the higher d dimensions can be simulated in
time proportional to O ( n ) . We summarize this result below.
THEOREM 7.7.5 A fully normal 2 d -dimensional hy percu b e algorithm (ascending or descending),
n = 2 2 d , can be realized in O ( n ) steps on an n
× n array of cells.
It follows from the discussion of Section 7.7 that broadca st ing, associative combining,
and the FFT algorithm can be executed on a 2D mesh in O ( n ) steps because each can be
implemented as a normal algorithm on the n -ver te x hy pe rcube.
Also, a list of n items can be sorte d on a n n
× n array in O ( n ) steps by translating
a normal merging algorithm to the n
× n array and using it recursively to cr eate a sorting
netw ork . (Se e P roblem 7.21 .) No sorting algorithm can sort in fewer than 2 m
2stepson
an m
× m array because whatever element is positioned in the lower right-hand co rn er of
the array could originate in the upper left-hand corner and have to traverse at least 2 m
2
edges to arrive there.
7.7.6 Normal Algorithms on Cube-Connected Cycles
Consider now processors connected as a d -dimensional cube-connected cycle (CCC) network
in which each ring has r = 2 k
d processors. In particular, let r be the smallest power of 2
greaterthanorequalto d ,sothat d
r< 2 d .(Thus k = O (log d ) .) We call such a CCC
network a canonical CCC network on n vertices. It has n = r 2 d vertices, d 2 d
n< ( 2 d ) 2 d .
(Thus d = O (log n ) .) We show that a fully normal algorithm can be executed efficiently on
such CCC networks.
Let each ring of the CCC network be indexed by a d -tuple corresponding to the corner
of the hypercube at which it resides. Let each processor be indexed by a ( d + k ) -tuple in
which the d low-order bits are the ring index and the k high-order bits specify the position of
a processor on the ring.
A fully normal algorithm on a canonical CCC network is implemented in two phases. In
the first phase, the ring is treated as an array and a fully normal algorithm on the k high-order
bits is simulated in O ( d ) steps. In the second phase, exchanges are made across hypercube
edges. Rotate the elements on each ring so that ring processors whose k -bit indices are 0 (call
these the lead elements ) are adjacent along the first dimension of the original hypercube. Ex-
change information between them. Now rotate the rings by one position so that lead elements
are adjacent along the second dimension of the original hypercube. The elements immediately
behind the lead elements on the rings are now adjacent along the first hypercube dimension
and are exchanged in parallel with the lead elements. (This simultaneous execution is called
pipelining .) Subsequent rotations of the rings place successive ring elements in alignment
along increasing bit positions. After O ( d ) rotations all exchanges are complete. Thus, a total
of O ( d ) time steps suffice to execute a fully normal algorithm. We have the following result.
 
Search WWH ::




Custom Search