Parallel Computation - Models of Computation: Exploring the Power of Computing

Information Technology Reference

In-Depth Information

numbers that agree in their d least significant positions. To simulate a normal hypercube algo-

rithm on the 2D mesh, in each row simulate a normal hypercube algorithm on 2 d vertices after

which in each column simulate a normal hypercube algorithm on 2 d vertices. The correctness

of this procedure follows because every adjacent pair of vertices of the simulated hypercube is

at some time located in adjacent cells of the 2D array.

From Theorem 7.7.4 it follows that hypercube exchanges across the lower d dimensions

can be simulated in time proportional to the length of a row, that is, in time O ( √ n ) . Similarly,

it also follows that hyperc ub e exchanges across the higher d dimensions can be simulated in

time proportional to O ( √ n ) . We summarize this result below.

THEOREM 7.7.5 A fully normal 2 d -dimensional hy percu b e algorithm (ascending or descending),

n = 2 2 d , can be realized in O ( √ n ) steps on an √ n

× √ n array of cells.

It follows from the discussion of Section 7.7 that broadca st ing, associative combining,

and the FFT algorithm can be executed on a 2D mesh in O ( √ n ) steps because each can be

implemented as a normal algorithm on the n -ver te x hy pe rcube.

Also, a list of n items can be sorte d on a n √ n

× √ n array in O ( √ n ) steps by translating

a normal merging algorithm to the √ n

× √ n array and using it recursively to cr eate a sorting

netw ork . (Se e P roblem 7.21 .) No sorting algorithm can sort in fewer than 2 √ m

−

2stepson

an √ m

× √ m array because whatever element is positioned in the lower right-hand co rn er of

the array could originate in the upper left-hand corner and have to traverse at least 2 √ m

−

edges to arrive there.

7.7.6 Normal Algorithms on Cube-Connected Cycles

Consider now processors connected as a d -dimensional cube-connected cycle (CCC) network

in which each ring has r = 2 k

d processors. In particular, let r be the smallest power of 2

greaterthanorequalto d ,sothat d

≥

r< 2 d .(Thus k = O (log d ) .) We call such a CCC

network a canonical CCC network on n vertices. It has n = r 2 d vertices, d 2 d

≤

n< ( 2 d ) 2 d .

(Thus d = O (log n ) .) We show that a fully normal algorithm can be executed efficiently on

such CCC networks.

Let each ring of the CCC network be indexed by a d -tuple corresponding to the corner

of the hypercube at which it resides. Let each processor be indexed by a ( d + k ) -tuple in

which the d low-order bits are the ring index and the k high-order bits specify the position of

a processor on the ring.

A fully normal algorithm on a canonical CCC network is implemented in two phases. In

the first phase, the ring is treated as an array and a fully normal algorithm on the k high-order

bits is simulated in O ( d ) steps. In the second phase, exchanges are made across hypercube

edges. Rotate the elements on each ring so that ring processors whose k -bit indices are 0 (call

these the lead elements ) are adjacent along the first dimension of the original hypercube. Ex-

change information between them. Now rotate the rings by one position so that lead elements

are adjacent along the second dimension of the original hypercube. The elements immediately

behind the lead elements on the rings are now adjacent along the first hypercube dimension

and are exchanged in parallel with the lead elements. (This simultaneous execution is called

pipelining .) Subsequent rotations of the rings place successive ring elements in alignment

along increasing bit positions. After O ( d ) rotations all exchanges are complete. Thus, a total

of O ( d ) time steps suffice to execute a fully normal algorithm. We have the following result.

≤

Models of Computation: Exploring the Power of Computing

Search WWH ::

Custom Search

Home