Digital Signal Processing Reference
In-Depth Information
Table 8.1 Select signals for multiplexers of the text example
Clock cycle
sel 1
sel 2
0
0
0
1
0
1
2
1
0
3
1
1
signal processing applications may require folding of algorithms for effective hardware mapping
that minimizes area. The folding can be accomplished by using a folding transformation. The
mathematical formulation of folding transformations is introduced in [21, 22]. A brief description is
given in this section.
For a given folding order and a folding set for a DFG, the folding transformation computes the
number of delays on each edge in the folded graph. The folded architecture periodically executes
operations of the DFG according to the folding set. Figure 8.17 shows an edge that connects nodes U i
and V j with W ij delays. The W ij delays on the edge U i !
V j signify that the output of node U i is used
by node V j afterW ij cycles or iterations in the original DFG. If the DFG is folded by a folding factor N,
then the folded architecture executes each iteration of the DFG in N cycles. All the nodes of type U
and Vin the DFG are scheduled on computational units H u andH v , respectively, in clock cycles u i and
v j such that 0 u i ; v i N 1. If nodes U and Vare of the same type, they may be scheduled on the
same computational unit in different clock cycles.
For the folded architecture, node U i is scheduled in clock cycle u i in the current iteration and node
V j is scheduled in v j clock cycle in the W ij iteration. In the original DFG the output of node U i is used
after W ij clock cycles, and now in the folded architecture, as the node U i is scheduled in the u i clock
cycle of the current iteration and it is used in the W ij iteration in the original DFG, as each iteration
takes N clock cycle thus the folded architecture starts executing the W ij iteration in the N W ij clock
cycle, where node V j is scheduled in the v j clock cycle in this iteration. This implies that, with respect
to the current iteration, node V j is scheduled in the N W ij þ v j clock cycle, so in the folded
architecture the result from node U i needs to be stored for F ij number of register clock cycles, where:
F ij ¼ N W ij þ v j u i
If the node of type U is mapped on a computational unit H u with P u pipeline stages, these delays
will also be incorporated in the above equation, and the new equation becomes:
F ij ¼ N W ij þ v j u i P u
u i
z - W
v j
ij
Figure 8.17 An edge in a DFG connecting nodes U i and V j with W ij delays
Search WWH ::




Custom Search