Parallel Computation - Models of Computation: Exploring the Power of Computing

Information Technology Reference

In-Depth Information

are 0, A 1,2 x 2 + A 1,3 x 3 ,0, A 2,2 x 2 + A 2,3 x 3 ,0, A 3,2 x 2 + A 3,3 x 3 , 0. The successive values of S 1

are 0, 0, A 1,1 x 1 + A 1,2 x 2 + A 1,3 x 3 ,0, A 2,1 x 1 + A 2,2 x 2 + A 2,3 x 3 ,0, A 3,1 x 1 + A 3,2 x 2 + A 3,3 x 3 .

The algorithm described above to compute the matrix-vector product for a 3

3matrix

clearly extends to arbitrary n

n matrices. (See Problem 7.8 .) Since the last element of an

n matrix arrives at the array after 3 n

−

2 time steps, such an array will complete its task in

3 n

−

1 time steps. A lower bound on the time for this problem (see Problem 7.9 )canbederived

by showing that the n 2 entries of the matrix A and the n entries of the matrix x must be read

to compute A x correctly by an algorithm, whether serial or not. By Theorem 7.4.1 it follows

that all systolic algorithms using n processors require n steps. Thus, the above algorithm is

nearly optimal to within a constant multiplicative factor.

THEOREM 7.5.1 There exists a linear systolic array with n cells that computes the product of an

n matrix with an n -vector in 3 n

−

1 steps, and no algorithm on such an array can do this

computation in fewer than n steps.

Since the product of two n

n matrices can be realized as n matrix-vector products with

an n

n matrix, an n -processor systolic array exists that can multiply two matrices nearly

optimally.

7.5.2 Sorting on Linear Arrays

A second application of linear systolic arrays is bubble sorting of integers. A sequential version

of the bubble sort algorithm passes over the entries in a tuple ( x 1 , x 2 , ... , x n ) from left to

right multiple times. On the first pass it finds the largest element and moves it to the rightmost

position. It applies the same procedure to the first n

−

1 elements of the resultant list, stopping

when it finds a list containing one element. This sequential procedure takes time proportional

to n +( n

+ 2 + 1 = n ( n + 1 ) / 2.

A parallel version of bubble sort, sometimes called odd-even transposition sort ,isnatu-

rally realized on a linear systolic array. The n entries of the array are placed in n cells. Let c i

be the word in the i th cell. We assume that in one unit of time two adjacent cells can read

words stored in each other's memories ( c i and c i + 1 ), compare them, and swap them if one ( c i )

is larger than the other ( c i + 1 ). The odd-even transposition sort algorithm executes n stages.

In the even-numbered stages, integers in even-numbered cells are compared with integers in

the next higher numbered cells and swapped, if larger. In the odd-numbered stages, the same

operation is performed on integers in odd-numbered cells. (See Fig. 7.9 .) We show that in n

steps the sorting is complete.

−

1 )+( n

−

2 )+

···

THEOREM 7.5.2 Bubble sort of n elements on a linear systolic array can be done in at most n steps.

Every algorithm to sort a list of n elements on a linear systolic array requires at least n

−

1 steps.

Thus, bubble sort on a linear systolic array is almost optimal.

Proof To derive the upper bound we use the zero-one principle (see Theorem 6.8.1 ), which

states that if a comparator network for inputs over an ordered set

correctly sorts all binary

inputs, it correctly sorts all inputs. The bubble sort systolic array maps directly to a com-

parator network because each of its operations is data-independent, that is, oblivious .To

see that the systolic array correctly sorts binary sequences, consider the position, r ,ofthe

rightmost 1 in the array.

Models of Computation: Exploring the Power of Computing

Search WWH ::

Custom Search

Home