Information Technology Reference
In-Depth Information
are 0,
A
1,2
x
2
+
A
1,3
x
3
,0,
A
2,2
x
2
+
A
2,3
x
3
,0,
A
3,2
x
2
+
A
3,3
x
3
, 0. The successive values of
S
1
are 0, 0,
A
1,1
x
1
+
A
1,2
x
2
+
A
1,3
x
3
,0,
A
2,1
x
1
+
A
2,2
x
2
+
A
2,3
x
3
,0,
A
3,1
x
1
+
A
3,2
x
2
+
A
3,3
x
3
.
The algorithm described above to compute the matrix-vector product for a 3
×
3matrix
clearly extends to arbitrary
n
×
n
matrices. (See Problem
7.8
.) Since the last element of an
n
×
n
matrix arrives at the array after 3
n
−
2 time steps, such an array will complete its task in
3
n
−
1 time steps. A lower bound on the time for this problem (see Problem
7.9
)canbederived
by showing that the
n
2
entries of the matrix
A
and the
n
entries of the matrix
x
must be read
to compute
A
x
correctly by an algorithm, whether serial or not. By Theorem
7.4.1
it follows
that all systolic algorithms using
n
processors require
n
steps. Thus, the above algorithm is
nearly optimal to within a constant multiplicative factor.
THEOREM
7.5.1
There exists a linear systolic array with
n
cells that computes the product of an
n
×
n
matrix with an
n
-vector in
3
n
−
1
steps, and no algorithm on such an array can do this
computation in fewer than
n
steps.
Since the product of two
n
×
n
matrices can be realized as
n
matrix-vector products with
an
n
n
matrix, an
n
-processor systolic array exists that can multiply two matrices nearly
optimally.
×
7.5.2 Sorting on Linear Arrays
A second application of linear systolic arrays is bubble sorting of integers. A sequential version
of the
bubble sort
algorithm passes over the entries in a tuple
(
x
1
,
x
2
,
...
,
x
n
)
from left to
right multiple times. On the first pass it finds the largest element and moves it to the rightmost
position. It applies the same procedure to the first
n
−
1 elements of the resultant list, stopping
when it finds a list containing one element. This sequential procedure takes time proportional
to
n
+(
n
+
2
+
1
=
n
(
n
+
1
)
/
2.
A parallel version of bubble sort, sometimes called
odd-even transposition sort
,isnatu-
rally realized on a linear systolic array. The
n
entries of the array are placed in
n
cells. Let
c
i
be the word in the
i
th cell. We assume that in one unit of time two adjacent cells can read
words stored in each other's memories (
c
i
and
c
i
+
1
), compare them, and swap them if one (
c
i
)
is larger than the other (
c
i
+
1
). The odd-even transposition sort algorithm executes
n
stages.
In the even-numbered stages, integers in even-numbered cells are compared with integers in
the next higher numbered cells and swapped, if larger. In the odd-numbered stages, the same
operation is performed on integers in odd-numbered cells. (See Fig.
7.9
.) We show that in
n
steps the sorting is complete.
−
1
)+(
n
−
2
)+
···
THEOREM
7.5.2
Bubble sort of
n
elements on a linear systolic array can be done in at most
n
steps.
Every algorithm to sort a list of
n
elements on a linear systolic array requires at least
n
−
1
steps.
Thus, bubble sort on a linear systolic array is almost optimal.
Proof
To derive the upper bound we use the zero-one principle (see Theorem
6.8.1
), which
states that if a comparator network for inputs over an ordered set
correctly sorts all binary
inputs, it correctly sorts all inputs. The bubble sort systolic array maps directly to a com-
parator network because each of its operations is data-independent, that is,
oblivious
.To
see that the systolic array correctly sorts binary sequences, consider the position,
r
,ofthe
rightmost 1 in the array.
A
Search WWH ::
Custom Search