Digital Signal Processing Reference
In-Depth Information
x 2n
h 3
h 2
h 1
h 0
y 2n
h 3
h 2
h 1
h 0
x 2n+1
h 3
h 2
h 1
h 0
y n
y 2n+1
Compression
tree (CT)
Compression
tree (CT)
Compression
tree (CT)
(a)
(b)
Figure 8.7 Unrolling an FIR filter. (a) Four-coefficient FIR filter. (b) The filter is unrolled by a factor
of 2
additional resources. The unfolded architecture can now be explored for further optimization.
Figure 8.7 shows a 4-coefficient FIR filter and a design after unfolding by a factor of 2. The
designer can nowdesign a computational unit consisting of twoCSDmultipliers and two adders as
one computational unit. This unit can be implemented as a compression tree producing a sumand a
carry. The architecture can also further exploit common sub-expression elimination (CSE)
techniques (see Chapter 6).
It is important to point out that the design can also be pipelined for effective throughput increase.
Inmany designs, simple pipeliningwithout any foldingmay cost less in terms of HWthan unfolding,
because unfolding creates a number of copies of the entire design.
8.4.5 Unfolding for Effective Use of FPGA Resources
Consider a design instance where the throughput is required to be increased by a factor of 2. Assume
the designer is using an FPGAwith embedded DSP48 blocks. The designer can easily add additional
pipeline registers and retime them between a multiplier and an adder, as shown in Figure 8.8. The
Each unit mapped on DSP48
x n
h 3
h 2
h 1
h 0
add_reg[3]
mul_reg[0]
y n
0
add_reg[0]
Figure 8.8 Pipelined FIR filter for effective mapping on FPGAs with DSP48 blocks
Search WWH ::




Custom Search