Multiplier-less Multiplication by Constants - Digital Design of Signal Processing Systems: A Practical Approach

Digital Signal Processing Reference

In-Depth Information

respective daisy-chain tap delay-line, the result needs to be subtracted. To implement this

subtraction the architecture selects the one's complement of the output from the ROM and a

cumulative correction term for all the six sub-filters is added as 4 0 b0110 in the compression tree.

The CPA is moved outside the accumulation module and the partial sum and partial carry from

the compression tree is latched in the two sets of accumulator registers. The contents in the

registers are also input to the compression tree. This makes the compression tree 9:2. If necessary

the CPA adder needs to work on slower output sample-clock clk G , whereas the compression tree

operates on fast bit-clock clk g . The final results from the compression trees are latched into two

sets of registers clocked with clk G for final addition using a CPA and the two accumulator

registers are reset to perform next set of computation.

6.8.4 DA Implementation without Look-up Tables

LUT-less DA implementation uses multiplexers. If the parallel implementation is extended to use

M

K, then each shift register is connected to a two-entry LUT that either selects a 0 or the

corresponding coefficient. The LUT can be implemented as a 2:1 MUX.

Designs for a 4-coefficient FIR filters are shown in Figure 6.22, using compression- and adder

tree-based implementation. For the adder tree design the architecture can be pipelined at each adder

stage if required.

The architectures of LUTand LUT-less implementation can be mixed to get a hybrid design. The

resultant design has a mix of MUX- and LUT-based implementation. The design requires reduced

sized LUTs.

Example: This example implements a DA-based biquadrature IIR filter. The transfer function of

the filter is:

¼

HðzÞ¼ b 0 þ b 1 z 1

þ b 2 z 2

1

a 1 z 1

a 2 z 2

This transfer function translates into a difference equation given by:

y½n¼b 0 x½nþb 1 x½n

1

þb 2 x½n

2

þa 1 y½n

1

þa 2 y½n

2

The difference equation can be easilymapped onDA-based architecture. Either two ROMs can be

designed for feed forward and feed back coefficients, or a unifiedROM-based design can be realized.

The two designs are shown in Figure 6.23. The value of the output, once computed, is loaded in

parallel to a shift register for y[n

1].

6.9 FFT Architecture using FIR Filter Structure

To fully exploit the potential optimization inmapping a DFTalgorithm in hardware using techniques

listed in this chapter, the DFTalgorithm can be implemented as an FIR filter. This requires rewriting

of the DFT expression as convolution summation. The Bluestein Chirp-z Transform (CZT)

algorithm transforms the DFT computation problem into FIR filtering [25]. The CZT translates

the nk term in the DFT summation in terms of (k n) for it to bewritten as a convolution summation.

Digital Design of Signal Processing Systems: A Practical Approach

Search WWH ::

Custom Search

Home