Information Technology Reference
In-Depth Information
r k
c i , j =
a i , q ×
b q , j
q = 1
As in Theorem 11.5.2 , c i , j is computed as a running sum, as suggested in Fig. 11.4 ,
where each vertex is associated with an r k ×
r k submatrix. It follows that 3 r k pebbles at
level k or less (not including the reserve pebbles) suffice to hold pebbles on submatrices a i , q ,
b q , j and the running sum. To compute a product a i , q ×
b q , j ,werepresent a i , q and b q , j as
block matrices with blocks that are r k− 1
r k− 1 matrices. Again, we form this product as
suggested in Fig. 11.4 ,using3 r k− 1 pebbles at levels k
×
1 or lower. This process is repeated
until we encounter a product of r 1
r 1 matrices, which is then pebbled according to the
procedure given in the proof of Theorem 11.5.2 .
Let's now determine the number of I/O and computation steps at each level. Since all
non-input vertices of G are pebbled once, the number of computation steps is O ( n 3 ) .I/O
operations are done only on input and output vertices. Once an output vertex has been
pebbled at the first level, reserve pebbles can be used to place a level- L pebble on it. Thus
one output is done on each of the n 2 output vertices at each level.
We now count the I/O operations on input vertices starting with level k . n
×
×
n matrices
A , B ,and C contain r k ×
r k matrices, where r k divides n .Eachofthe ( n/r k ) 2 submatrices
a i , q and b q , j is used in ( n/r k ) inner products and at most r k I/O operations at level k are
performed on them. (If most of the s k pebbles at level k or less are at lower levels, fewer
level- k I/O operations will be performed.) Thus, at most 2 ( n/r k ) 2 ( n/r k ) r k
= 2 n 2 /r k
I/O operations are performed at level k .Inturn,eachofthe r k ×
r k matrices contains
( r k /r k− 1 ) 2 r k− 1 ×
r k− 1 matrices; each of these is involved in ( r k /r k− 1 ) inner products
each of which requires at most r k− 1
I/O operations. Since there are at most ( n/r k− 1 ) 2
r k− 1 submatrices in each of A , B ,and C ,atmost2 n 3 /r k− 1 I/O operations are
performed at level k
r k− 1 ×
1. Continuing in this fashion, at most 2 n 3 /r l I/O operations are
performed at level l for 2
( s i
l
k .Since r l
i + 1 ) / 12, we have the desired
conclusion.
Since the above pebbling strategy does not place pebbles at level 2 or above on any vertex
except input and output vertices, it applies in the I/O-limited case. The lower bound follows
from Lemma 11.3.1 and Theorem 11.5.2 .
11.5.3 The Fast Fourier Transform
The fast Fourier transform (FFT) algorithm is described in Section 6.7.3 (an FFT graph is
giveninFig. 11.1 ). A lower bound is obtained by the Hong-Kung method for the FFT by
deriving an upper bound on the S -span of the FFT graph. In this section all logarithms have
base 2.
LEMMA 11.5.2 The S -span of the FFT graph F ( d )
on n = 2 d
inputs satisfies ρ ( S , G )
2 S log S when S
n .
Proof ρ ( S , G ) is the maximum number of vertices of G that can be pebbled with S red
pebbles from an initial placement of these pebbles, maximized over all such initial place-
ments. G contains many two-input FFT (butterfly) graphs, as shown in Fig. 11.8 .If v 1
and v 2 are the output vertices in such a two-input FFT and if one of them is pebbled, we
Search WWH ::




Custom Search