Digital Signal Processing Reference
In-Depth Information
a
D
A
B
C
2D
b
...
A
0
B
0
C
1
A
3
B
3
C
4
A
6
B
6
C
7
...
A
1
B
1
C
2
A
4
B
4
C
5
A
7
B
7
...
A
2
B
2
C
3
A
5
B
5
C
6
Fig. 14
(
a
) The original DFG. (
b
) The infinitely unfolded DFG
a
b
Nl+v
H
U
P
U
D
U
w(e)
D
V
D
F
(U V)
H
V
(
a
) An edge
U
e
Fig. 15
→
V
with
w
(
e
)
delays. (
b
) The corresponding folded datapath. The data
begin at the functional unit
H
U
which has
P
U
pipelining stage, pass through
D
F
(
e
→
U
V
)
delays,
and are switched into the functional unit
H
V
at the time instances
Nl
+
v
silicon area. In general, folding can be used to reduce the number of hardware
functional units by a factor of
N
at the expense of increasing the computation time
by a factor of
N
.
Consider the edge
e
connecting the nodes
U
and
V
with
w
delays, as shown in
at the time units
Nl
(
e
)
v
, respectively, where
u
and
v
are folding orders
of the nodes
U
and
V
that satisfy 0
+
u
and
Nl
+
1and
N
is the folding factor.
The folding order of a node is the time partition to which the node is scheduled
to execute in hardware. The functional units that execute the nodes
U
and
V
are
denoted as
H
U
and
H
V
, respectively. If
H
U
is pipelined by
P
U
stages, then the result
of the
l
-th iteration of the node
U
is available at the time unit
Nl
≤
u
,
v
≤
N
−
+
u
+
P
U
. Since the
e
→
edge
U
V
has
w
(
e
)
delays, the result of the
l
-th iteration of the node
U
is used
by the
(
l
+
w
(
e
))
-th iteration of the node
V
, which is executed at
N
(
l
+
w
(
e
)) +
v
.
Therefore, the result must be stored for
e
→
D
F
(
U
V
)=[
N
(
l
+
w
(
e
))+
v
]
−
[
Nl
+
P
U
+
u
]=
Nw
(
e
)
−
P
U
+
v
−
u
(9)
e
→
time units, which is independent of the iteration number
l
. The edge
U
V
is
e
→
implemented as a path from
H
U
to
H
V
in the architecture with
D
F
(
U
V
)
delays,
and data on this path are input to
H
V
at
Nl
+