Information Technology Reference
In-Depth Information
The solution to this recurrence is S ( k )=( b + c ) 2 k
c as the reader can verify. Since
H k has n = 4 k
leaves and area A n =( S ( k )) 2 , it follows that an n -vertex H-tree has area
A n
n ( b + c ) 2 .
To appreciate the importance of the H-tree construction, observe that its leaves are interior
to the layout. Given the usual drawing of a binary tree one is tempted to place its leaves along
the boundary of a chip. If this boundary is convex, the area of a binary tree on n leaves must
be at least proportional to n log n . (See Problem 12.3 .)
MATRIX-VECTOR MULTIPLICATION ON AN H-TREE We now describe an algorithm based on an
H-tree that multiplies an n
n matrix A with an n -vector x , n = 2 k , by forming the n inner
products of the n rows of A with x . (Matrix-vector multiplication is defined in Section 6.2.2 .)
This algorithm assumes that one unit of time is taken to store one piece of data and to perform
an addition or multiplication on data.
On the first time step of our algorithm the components of the vector x are supplied in
parallel to the n leaves of the tree and stored there. On the second time step components of
the first row of A are also provided in parallel to the leaves. In the third time step the product
of corresponding components of x and the first row of A are multiplied. In k =log 2 n
additional time steps these products are added in the H-tree and the result supplied as output.
In the next two steps the second row of A is supplied as input and its components multiplied
by those of x .After k additional steps these products are summed and the result generated
as output. This process is repeated for each of the remaining rows of A . This algorithm is
semellective.
Since we treat the time to add and multiply as the basis for measuring the time required
by this H-tree, each inner product requires O (log n ) time and the n inner products require
O ( n log n ) time. However, if each addition vertex in this tree can also store its result (thereby
causing a slight increase in area), a new row of A can be supplied to the H-tree in each unit
of time (we say the period of the computation is P = 1) because a series of partial results
can move through the tree in parallel. This is an example of pipelining. In this case the time
to perform the n inner products is O ( n +log n )= O ( n ) . If pipelining is not used, this
matrix-vector multiplication algorithm does not make the best use of area and time, as we now
show.
Even without pipelining there exists an AT optimal algorithm for matrix-vector multipli-
cation. Let n be such that n/ log 2 n is a power of 4. Decompose each row of A as well as x
into (log 2 n ) -tuples. This is equivalent to representing the n
×
×
n matrix A by a n
×
( n/ log 2 n )
matrix B whose entries are 1
log 2 n matrices (equivalently, (log 2 n ) -vectors) and to repre-
senting x by an ( n/ log 2 n ) -vector y whose components are (log 2 n ) -vectors.
We implement this computation on an H-tree with O ( n/ log n ) area. To compute the
inner product of A 's j th row with x , sequentially supply to each H-tree leaf the components
of one (log 2 n ) -vector of y and the corresponding vector in the j th row of B . Supply the
individual components of these (log 2 n ) -vectors in alternate cycles. After a leaf vertex receives
the corresponding components of A and x , it multiplies them and adds the result to its running
sum. Upon completion of an inner product of two (log 2 n ) -vectors, the leaf vertices make their
values available to be added in the H-tree in O (log n ) steps. After n of these operations, all n
inner products of A x are computed.
This algorithm uses T = O ( n log n ) time but only has area A = O ( n/ log n ) .Thus,
its area-time product satisfies AT = O ( n 2 ) , which is optimal since each of the n 2 + n
×
Search WWH ::




Custom Search