Memory-Hierarchy Tradeoffs - Models of Computation: Exploring the Power of Computing

Information Technology Reference

In-Depth Information

r k

c i , j =

a i , q ×

b q , j

q = 1

As in Theorem 11.5.2 , c i , j is computed as a running sum, as suggested in Fig. 11.4 ,

where each vertex is associated with an r k ×

r k submatrix. It follows that 3 r k pebbles at

level k or less (not including the reserve pebbles) suffice to hold pebbles on submatrices a i , q ,

b q , j and the running sum. To compute a product a i , q ×

b q , j ,werepresent a i , q and b q , j as

block matrices with blocks that are r k− 1

r k− 1 matrices. Again, we form this product as

suggested in Fig. 11.4 ,using3 r k− 1 pebbles at levels k

−

1 or lower. This process is repeated

until we encounter a product of r 1

r 1 matrices, which is then pebbled according to the

procedure given in the proof of Theorem 11.5.2 .

Let's now determine the number of I/O and computation steps at each level. Since all

non-input vertices of G are pebbled once, the number of computation steps is O ( n 3 ) .I/O

operations are done only on input and output vertices. Once an output vertex has been

pebbled at the first level, reserve pebbles can be used to place a level- L pebble on it. Thus

one output is done on each of the n 2 output vertices at each level.

We now count the I/O operations on input vertices starting with level k . n

n matrices

A , B ,and C contain r k ×

r k matrices, where r k divides n .Eachofthe ( n/r k ) 2 submatrices

a i , q and b q , j is used in ( n/r k ) inner products and at most r k I/O operations at level k are

performed on them. (If most of the s k pebbles at level k or less are at lower levels, fewer

level- k I/O operations will be performed.) Thus, at most 2 ( n/r k ) 2 ( n/r k ) r k

= 2 n 2 /r k

I/O operations are performed at level k .Inturn,eachofthe r k ×

r k matrices contains

( r k /r k− 1 ) 2 r k− 1 ×

r k− 1 matrices; each of these is involved in ( r k /r k− 1 ) inner products

each of which requires at most r k− 1

I/O operations. Since there are at most ( n/r k− 1 ) 2

r k− 1 submatrices in each of A , B ,and C ,atmost2 n 3 /r k− 1 I/O operations are

performed at level k

r k− 1 ×

1. Continuing in this fashion, at most 2 n 3 /r l I/O operations are

performed at level l for 2

−

( s i −

≤

k .Since r l ≥

i + 1 ) / 12, we have the desired

conclusion.

Since the above pebbling strategy does not place pebbles at level 2 or above on any vertex

except input and output vertices, it applies in the I/O-limited case. The lower bound follows

from Lemma 11.3.1 and Theorem 11.5.2 .

11.5.3 The Fast Fourier Transform

The fast Fourier transform (FFT) algorithm is described in Section 6.7.3 (an FFT graph is

giveninFig. 11.1 ). A lower bound is obtained by the Hong-Kung method for the FFT by

deriving an upper bound on the S -span of the FFT graph. In this section all logarithms have

base 2.

LEMMA 11.5.2 The S -span of the FFT graph F ( d )

on n = 2 d

inputs satisfies ρ ( S , G )

≤

2 S log S when S

n .

Proof ρ ( S , G ) is the maximum number of vertices of G that can be pebbled with S red

pebbles from an initial placement of these pebbles, maximized over all such initial place-

ments. G contains many two-input FFT (butterfly) graphs, as shown in Fig. 11.8 .If v 1

and v 2 are the output vertices in such a two-input FFT and if one of them is pebbled, we

≤

Models of Computation: Exploring the Power of Computing

Search WWH ::

Custom Search

Home