Unfolding and Folding of Architectures - Digital Design of Signal Processing Systems: A Practical Approach

Digital Signal Processing Reference

In-Depth Information

Table 8.1 Select signals for multiplexers of the text example

Clock cycle

sel 1

sel 2

0

1

0

1

2

1

0

3

1

signal processing applications may require folding of algorithms for effective hardware mapping

that minimizes area. The folding can be accomplished by using a folding transformation. The

mathematical formulation of folding transformations is introduced in [21, 22]. A brief description is

given in this section.

For a given folding order and a folding set for a DFG, the folding transformation computes the

number of delays on each edge in the folded graph. The folded architecture periodically executes

operations of the DFG according to the folding set. Figure 8.17 shows an edge that connects nodes U i

and V j with W ij delays. The W ij delays on the edge U i !

V j signify that the output of node U i is used

by node V j afterW ij cycles or iterations in the original DFG. If the DFG is folded by a folding factor N,

then the folded architecture executes each iteration of the DFG in N cycles. All the nodes of type U

and Vin the DFG are scheduled on computational units H u andH v , respectively, in clock cycles u i and

v j such that 0 u i ; v i N 1. If nodes U and Vare of the same type, they may be scheduled on the

same computational unit in different clock cycles.

For the folded architecture, node U i is scheduled in clock cycle u i in the current iteration and node

V j is scheduled in v j clock cycle in the W ij iteration. In the original DFG the output of node U i is used

after W ij clock cycles, and now in the folded architecture, as the node U i is scheduled in the u i clock

cycle of the current iteration and it is used in the W ij iteration in the original DFG, as each iteration

takes N clock cycle thus the folded architecture starts executing the W ij iteration in the N W ij clock

cycle, where node V j is scheduled in the v j clock cycle in this iteration. This implies that, with respect

to the current iteration, node V j is scheduled in the N W ij þ v j clock cycle, so in the folded

architecture the result from node U i needs to be stored for F ij number of register clock cycles, where:

F ij ¼ N W ij þ v j u i

If the node of type U is mapped on a computational unit H u with P u pipeline stages, these delays

will also be incorporated in the above equation, and the new equation becomes:

F ij ¼ N W ij þ v j u i P u

u i

z - W

v j

ij

Figure 8.17 An edge in a DFG connecting nodes U i and V j with W ij delays

Digital Design of Signal Processing Systems: A Practical Approach

Search WWH ::

Custom Search

Home