Digital Signal Processing Reference
In-Depth Information
respective daisy-chain tap delay-line, the result needs to be subtracted. To implement this
subtraction the architecture selects the one's complement of the output from the ROM and a
cumulative correction term for all the six sub-filters is added as 4 0 b0110 in the compression tree.
The CPA is moved outside the accumulation module and the partial sum and partial carry from
the compression tree is latched in the two sets of accumulator registers. The contents in the
registers are also input to the compression tree. This makes the compression tree 9:2. If necessary
the CPA adder needs to work on slower output sample-clock clk G , whereas the compression tree
operates on fast bit-clock clk g . The final results from the compression trees are latched into two
sets of registers clocked with clk G for final addition using a CPA and the two accumulator
registers are reset to perform next set of computation.
6.8.4 DA Implementation without Look-up Tables
LUT-less DA implementation uses multiplexers. If the parallel implementation is extended to use
M
K, then each shift register is connected to a two-entry LUT that either selects a 0 or the
corresponding coefficient. The LUT can be implemented as a 2:1 MUX.
Designs for a 4-coefficient FIR filters are shown in Figure 6.22, using compression- and adder
tree-based implementation. For the adder tree design the architecture can be pipelined at each adder
stage if required.
The architectures of LUTand LUT-less implementation can be mixed to get a hybrid design. The
resultant design has a mix of MUX- and LUT-based implementation. The design requires reduced
sized LUTs.
Example: This example implements a DA-based biquadrature IIR filter. The transfer function of
the filter is:
¼
HðzÞ¼ b 0 þ b 1 z 1
þ b 2 z 2
1
a 1 z 1
a 2 z 2
This transfer function translates into a difference equation given by:
y½n¼b 0 x½nþb 1 x½n
1
þb 2 x½n
2
þa 1 y½n
1
þa 2 y½n
2
The difference equation can be easilymapped onDA-based architecture. Either two ROMs can be
designed for feed forward and feed back coefficients, or a unifiedROM-based design can be realized.
The two designs are shown in Figure 6.23. The value of the output, once computed, is loaded in
parallel to a shift register for y[n
1].
6.9 FFT Architecture using FIR Filter Structure
To fully exploit the potential optimization inmapping a DFTalgorithm in hardware using techniques
listed in this chapter, the DFTalgorithm can be implemented as an FIR filter. This requires rewriting
of the DFT expression as convolution summation. The Bluestein Chirp-z Transform (CZT)
algorithm transforms the DFT computation problem into FIR filtering [25]. The CZT translates
the nk term in the DFT summation in terms of (k n) for it to bewritten as a convolution summation.
Search WWH ::




Custom Search