Information Technology Reference
In-Depth Information
components of
A
and
x
must be read. This algorithm is multilective because it supplies each
component of
x
n
times.
PREFIX COMPUTATION ON AN H-TREE
TheH-treeisalsoaneffectivewaytodoaprefixcom-
putation. Prefix computations (let
be the associative operator) are naturally executed on
trees. A tree-based prefix computation is described in Problem
7.31
. One datum enters the
root of the tree; the rest travel up from the leaves. When implemented on an H-tree, this algo-
rithm uses area
O
(
n
)
on
n
inputs and time
O
(log
n
)
, giving an
AT
product of
O
(
n
log
n
)
.
This algorithm is semellective.
This algorithm can be converted into an
AT
-optimal algorithm using a technique similar
to that used above. We subdivide the input
n
-tuple
x
into
(log
2
n
)
-tuples, of which there are
(
n/
log
2
n
)
, and serially form the associative combination of the
(log
2
n
)
components of each
tuple using
in
(log
2
n
)
steps. We then perform the prefix computation on these
(
n/
log
2
n
)
results. To complete the computation, for 1
≤
j
≤
(
n/
log
2
n
)
−
1 we reread each of the
original
(log
2
n
)
-tuples in parallel and add the
(
j
1
)
st result (the zeroth result is 0) to the
first component of the
j
th
(log
2
n
)
-tuple, and then serially perform a prefix computation on
these new
(log
2
n
)
-tuples.
We inc rea s e
(
n/
log
2
n
)
to the next power of 4 (adding inputs whose corresponding out-
puts are ignored) and embed the tree of Fig.
7.23
directly into an H-tree. The initial associative
combination of
(log
2
n
)
-tuples and the final prefix computation on
(log
2
n
)
-tuples are done
at vertices of the H-tree that are I/O vertices of the prefix tree. This algorithm takes time
O
(log
n
)
on the initial and final phases as well as on the prefix computation. Since the area of
the layout is
O
(
n/
log
2
n
)
and every one of the
n
inputs must be read, its area-time product,
AT
,is
O
(
n
)
which is optimal. This algorithm is multilective since each input is supplied
twice.
−
12.5.2 Multi-dimensional Mesh Layouts
As explained in Section
7.5
, many important problems can be solved with systolic arrays. If
the cells of one- and two-dimensional systolic arrays are of fixed size and quasiplanar, they can
be embedded directly onto a chip with area proportional to the number of cells. Applying the
results of Theorems
7.5.1
,
7.5.2
,and
7.5.3
we have the following facts concerning the area and
time for three important problems when realized by such arrays.
Problem
Dimensions
Area
Time
n
×
n
Matrix-Vector Multiplication
O
(
n
)
O
(
n
)
1D
Bubble Sort of
n
items
O
(
n
)
O
(
n
)
1D
Batcher's Odd-Even Sorting of
n
items
1D
O
(
n
)
O
(
n
)
√
n
×
√
n
Matrix-Matrix Multiplication
O
(
√
n
)
O
(
n
)
2D
Fully normal algorithms for problems such as shifting, summing, broadcasting, and fast
Fourier transform on
n
=
2
2
d
inputs can each be done in
O
(log
n
)
steps on the
n
-vertex hy-
percube or the canonical cube-connected cycles network on
n
ver
ti
ces. From Theorems
7.7.4
and
7.7.5
these problems can also be solved in
O
(
n
)
and
O
(
√
n
)
steps, respectively, on
n
-
vertex one- and two-dimensional systolic arrays. We summarize these facts in Figure
12.3
.
Search WWH ::
Custom Search