Digital Signal Processing Reference
In-Depth Information
U 0
x 2n
y 2n
b0
V 0
U
y n
x n
a 1
b1
b 0
V
a 1
b 1
a 2
b2
y 2n+1
x 2n+1
U 1
b0
a 2
b 2
V 1
a 1
b1
a 2
b 2
(a)
(b)
Figure 8.5 Unfolding transformations (a) Second-order TDF structure. (b) Unfolding with a factor
of 2
8.4.3 Loop Unrolling for Mapping SW to HW
In many applications, algorithms are developed in high-level languages. Their implementation
usually involves nested loops. The algorithm processes a defined number of input samples every
second. The designer needs to map the algorithm in HW to meet the computational requirements.
Usually these requirements are such that the entire algorithm need not to be unrolled, but a few
iterations are unrolled for effective mapping. The unfolding should be carefully designed as it may
lead to more memory accesses.
Loop unrolling for SW to HWmapping is usually more involved than application of an unfolding
transformation on DFGs. The code should be carefully analyzed because, in instances with several
nested loops, unrolling the innermost loop may not generate an optimal architecture. The architect
should explore the design space by unrolling different loops in the nesting and also trymerging a few
nested loops together to find an effective design.
For example, in the case of a code that filters a block of data using an FIR filter, unrolling the
outer loop that computes multiple output samples, rather than the inner loop that computes one
output sample, offers a better design option. In this example the same data values are used for
computing multiple output samples, thus minimizing the memory accesses. This type of
unrolling is difficult to achieve using automatic loop unrolling techniques. The following
example shows an FIR filter implementation that is then unrolled to compute four output
samples in parallel:
 
Search WWH ::




Custom Search