Digital Signal Processing Reference
In-Depth Information
a
D
A
B
C
2D
b
...
A 0
B 0
C 1
A 3
B 3
C 4
A 6
B 6
C 7
...
A 1
B 1
C 2
A 4
B 4
C 5
A 7
B 7
...
A 2
B 2
C 3
A 5
B 5
C 6
Fig. 14
( a ) The original DFG. ( b ) The infinitely unfolded DFG
a
b
Nl+v
H U
P U D
U
w(e) D
V
D F (U V)
H V
( a ) An edge U e
Fig. 15
V with w ( e ) delays. ( b ) The corresponding folded datapath. The data
begin at the functional unit H U which has P U pipelining stage, pass through D F (
e
U
V
)
delays,
and are switched into the functional unit H V at the time instances Nl
+
v
silicon area. In general, folding can be used to reduce the number of hardware
functional units by a factor of N at the expense of increasing the computation time
by a factor of N .
Consider the edge e connecting the nodes U and V with w
delays, as shown in
Fig. 15 a . Let the executions of the l -th iteration of the nodes U and V be scheduled
at the time units Nl
(
e
)
v , respectively, where u and v are folding orders
of the nodes U and V that satisfy 0
+
u and Nl
+
1and N is the folding factor.
The folding order of a node is the time partition to which the node is scheduled
to execute in hardware. The functional units that execute the nodes U and V are
denoted as H U and H V , respectively. If H U is pipelined by P U stages, then the result
of the l -th iteration of the node U is available at the time unit Nl
u
,
v
N
+
u
+
P U . Since the
e
edge U
V has w
(
e
)
delays, the result of the l -th iteration of the node U is used
by the
(
l
+
w
(
e
))
-th iteration of the node V , which is executed at N
(
l
+
w
(
e
)) +
v .
Therefore, the result must be stored for
e
D F (
U
V
)=[
N
(
l
+
w
(
e
))+
v
] [
Nl
+
P U +
u
]=
Nw
(
e
)
P U +
v
u
(9)
e
time units, which is independent of the iteration number l . The edge U
V is
e
implemented as a path from H U to H V in the architecture with D F (
U
V
)
delays,
and data on this path are input to H V at Nl
+
v ,asinFig. 15 b .
 
 
 
Search WWH ::




Custom Search