VLSI Models of Computation - Models of Computation: Exploring the Power of Computing

Information Technology Reference

In-Depth Information

components of A and x must be read. This algorithm is multilective because it supplies each

component of x n times.

PREFIX COMPUTATION ON AN H-TREE TheH-treeisalsoaneffectivewaytodoaprefixcom-

putation. Prefix computations (let

be the associative operator) are naturally executed on

trees. A tree-based prefix computation is described in Problem 7.31 . One datum enters the

root of the tree; the rest travel up from the leaves. When implemented on an H-tree, this algo-

rithm uses area O ( n ) on n inputs and time O (log n ) , giving an AT product of O ( n log n ) .

This algorithm is semellective.

This algorithm can be converted into an AT -optimal algorithm using a technique similar

to that used above. We subdivide the input n -tuple x into (log 2 n ) -tuples, of which there are

( n/ log 2 n ) , and serially form the associative combination of the (log 2 n ) components of each

tuple using

in (log 2 n ) steps. We then perform the prefix computation on these ( n/ log 2 n )

results. To complete the computation, for 1

≤

( n/ log 2 n )

−

1 we reread each of the

original (log 2 n ) -tuples in parallel and add the ( j

1 ) st result (the zeroth result is 0) to the

first component of the j th (log 2 n ) -tuple, and then serially perform a prefix computation on

these new (log 2 n ) -tuples.

We inc rea s e ( n/ log 2 n ) to the next power of 4 (adding inputs whose corresponding out-

puts are ignored) and embed the tree of Fig. 7.23 directly into an H-tree. The initial associative

combination of (log 2 n ) -tuples and the final prefix computation on (log 2 n ) -tuples are done

at vertices of the H-tree that are I/O vertices of the prefix tree. This algorithm takes time

O (log n ) on the initial and final phases as well as on the prefix computation. Since the area of

the layout is O ( n/ log 2 n ) and every one of the n inputs must be read, its area-time product,

AT ,is O ( n ) which is optimal. This algorithm is multilective since each input is supplied

twice.

−

12.5.2 Multi-dimensional Mesh Layouts

As explained in Section 7.5 , many important problems can be solved with systolic arrays. If

the cells of one- and two-dimensional systolic arrays are of fixed size and quasiplanar, they can

be embedded directly onto a chip with area proportional to the number of cells. Applying the

results of Theorems 7.5.1 , 7.5.2 ,and 7.5.3 we have the following facts concerning the area and

time for three important problems when realized by such arrays.

Problem

Dimensions

Area

Time

n Matrix-Vector Multiplication

O ( n )

Bubble Sort of n items

O ( n )

Batcher's Odd-Even Sorting of n items

O ( n )

√ n

× √ n Matrix-Matrix Multiplication

O ( √ n )

O ( n )

Fully normal algorithms for problems such as shifting, summing, broadcasting, and fast

Fourier transform on n = 2 2 d inputs can each be done in O (log n ) steps on the n -vertex hy-

percube or the canonical cube-connected cycles network on n ver ti ces. From Theorems 7.7.4

and 7.7.5 these problems can also be solved in O ( n ) and O ( √ n ) steps, respectively, on n -

vertex one- and two-dimensional systolic arrays. We summarize these facts in Figure 12.3 .

Models of Computation: Exploring the Power of Computing

Search WWH ::

Custom Search

Home