Information Technology Reference
In-Depth Information
r
k
c
i
,
j
=
a
i
,
q
×
b
q
,
j
q
=
1
As in Theorem
11.5.2
,
c
i
,
j
is computed as a running sum, as suggested in Fig.
11.4
,
where each vertex is associated with an
r
k
×
r
k
submatrix. It follows that 3
r
k
pebbles at
level
k
or less (not including the reserve pebbles) suffice to hold pebbles on submatrices
a
i
,
q
,
b
q
,
j
and the running sum. To compute a product
a
i
,
q
×
b
q
,
j
,werepresent
a
i
,
q
and
b
q
,
j
as
block matrices with blocks that are
r
k−
1
r
k−
1
matrices. Again, we form this product as
suggested in Fig.
11.4
,using3
r
k−
1
pebbles at levels
k
×
−
1 or lower. This process is repeated
until we encounter a product of
r
1
r
1
matrices, which is then pebbled according to the
procedure given in the proof of Theorem
11.5.2
.
Let's now determine the number of I/O and computation steps at each level. Since all
non-input vertices of
G
are pebbled once, the number of computation steps is
O
(
n
3
)
.I/O
operations are done only on input and output vertices. Once an output vertex has been
pebbled at the first level, reserve pebbles can be used to place a level-
L
pebble on it. Thus
one output is done on each of the
n
2
output vertices at each level.
We now count the I/O operations on input vertices starting with level
k
.
n
×
×
n
matrices
A
,
B
,and
C
contain
r
k
×
r
k
matrices, where
r
k
divides
n
.Eachofthe
(
n/r
k
)
2
submatrices
a
i
,
q
and
b
q
,
j
is used in
(
n/r
k
)
inner products and at most
r
k
I/O operations at level
k
are
performed on them. (If most of the
s
k
pebbles at level
k
or less are at lower levels, fewer
level-
k
I/O operations will be performed.) Thus, at most 2
(
n/r
k
)
2
(
n/r
k
)
r
k
=
2
n
2
/r
k
I/O operations are performed at level
k
.Inturn,eachofthe
r
k
×
r
k
matrices contains
(
r
k
/r
k−
1
)
2
r
k−
1
×
r
k−
1
matrices; each of these is involved in
(
r
k
/r
k−
1
)
inner products
each of which requires at most
r
k−
1
I/O operations. Since there are at most
(
n/r
k−
1
)
2
r
k−
1
submatrices in each of
A
,
B
,and
C
,atmost2
n
3
/r
k−
1
I/O operations are
performed at level
k
r
k−
1
×
1. Continuing in this fashion, at most 2
n
3
/r
l
I/O operations are
performed at level
l
for 2
−
(
s
i
−
≤
l
≤
k
.Since
r
l
≥
i
+
1
)
/
12, we have the desired
conclusion.
Since the above pebbling strategy does not place pebbles at level 2 or above on any vertex
except input and output vertices, it applies in the I/O-limited case. The lower bound follows
from Lemma
11.3.1
and Theorem
11.5.2
.
11.5.3 The Fast Fourier Transform
The fast Fourier transform (FFT) algorithm is described in Section
6.7.3
(an FFT graph is
giveninFig.
11.1
). A lower bound is obtained by the Hong-Kung method for the FFT by
deriving an upper bound on the
S
-span of the FFT graph. In this section all logarithms have
base 2.
LEMMA
11.5.2
The
S
-span of the FFT graph
F
(
d
)
on
n
=
2
d
inputs satisfies
ρ
(
S
,
G
)
≤
2
S
log
S
when
S
n
.
Proof
ρ
(
S
,
G
)
is the maximum number of vertices of
G
that can be pebbled with
S
red
pebbles from an initial placement of these pebbles, maximized over all such initial place-
ments.
G
contains many two-input FFT (butterfly) graphs, as shown in Fig.
11.8
.If
v
1
and
v
2
are the output vertices in such a two-input FFT and if one of them is pebbled, we
≤
Search WWH ::
Custom Search