Information Technology Reference
In-Depth Information
components of A and x must be read. This algorithm is multilective because it supplies each
component of x n times.
PREFIX COMPUTATION ON AN H-TREE TheH-treeisalsoaneffectivewaytodoaprefixcom-
putation. Prefix computations (let
be the associative operator) are naturally executed on
trees. A tree-based prefix computation is described in Problem 7.31 . One datum enters the
root of the tree; the rest travel up from the leaves. When implemented on an H-tree, this algo-
rithm uses area O ( n ) on n inputs and time O (log n ) , giving an AT product of O ( n log n ) .
This algorithm is semellective.
This algorithm can be converted into an AT -optimal algorithm using a technique similar
to that used above. We subdivide the input n -tuple x into (log 2 n ) -tuples, of which there are
( n/ log 2 n ) , and serially form the associative combination of the (log 2 n ) components of each
tuple using
in (log 2 n ) steps. We then perform the prefix computation on these ( n/ log 2 n )
results. To complete the computation, for 1
j
( n/ log 2 n )
1 we reread each of the
original (log 2 n ) -tuples in parallel and add the ( j
1 ) st result (the zeroth result is 0) to the
first component of the j th (log 2 n ) -tuple, and then serially perform a prefix computation on
these new (log 2 n ) -tuples.
We inc rea s e ( n/ log 2 n ) to the next power of 4 (adding inputs whose corresponding out-
puts are ignored) and embed the tree of Fig. 7.23 directly into an H-tree. The initial associative
combination of (log 2 n ) -tuples and the final prefix computation on (log 2 n ) -tuples are done
at vertices of the H-tree that are I/O vertices of the prefix tree. This algorithm takes time
O (log n ) on the initial and final phases as well as on the prefix computation. Since the area of
the layout is O ( n/ log 2 n ) and every one of the n inputs must be read, its area-time product,
AT ,is O ( n ) which is optimal. This algorithm is multilective since each input is supplied
twice.
12.5.2 Multi-dimensional Mesh Layouts
As explained in Section 7.5 , many important problems can be solved with systolic arrays. If
the cells of one- and two-dimensional systolic arrays are of fixed size and quasiplanar, they can
be embedded directly onto a chip with area proportional to the number of cells. Applying the
results of Theorems 7.5.1 , 7.5.2 ,and 7.5.3 we have the following facts concerning the area and
time for three important problems when realized by such arrays.
Problem
Dimensions
Area
Time
n
×
n Matrix-Vector Multiplication
O ( n )
O ( n )
1D
Bubble Sort of n items
O ( n )
O ( n )
1D
Batcher's Odd-Even Sorting of n items
1D
O ( n )
O ( n )
n
× n Matrix-Matrix Multiplication
O ( n )
O ( n )
2D
Fully normal algorithms for problems such as shifting, summing, broadcasting, and fast
Fourier transform on n = 2 2 d inputs can each be done in O (log n ) steps on the n -vertex hy-
percube or the canonical cube-connected cycles network on n ver ti ces. From Theorems 7.7.4
and 7.7.5 these problems can also be solved in O ( n ) and O ( n ) steps, respectively, on n -
vertex one- and two-dimensional systolic arrays. We summarize these facts in Figure 12.3 .
 
Search WWH ::




Custom Search