Digital Signal Processing Reference
In-Depth Information
8.5.3 Folded Architectures for FFT Computation
Most signal processing algorithms have a regular structure. The fast Fourier transform (FFT) is
another example to demonstrate the folding technique. An FFT algorithm efficiently implements
DFT summation:
N 1
0 x½W n N k ¼
X½¼
0
;
1
;
2
; ... ; N
1
ð
8
:
6
Þ
where
2 p nk
N
j
W n N ¼ e
are called twiddle factors. N is the length of the signal x[n], and n and k are the indices in time and
frequency, respectively.
The number of real multiplications required to perform a complex multiplication can be first
reduced from four to three by the following simple mathematical manipulation:
ð
aþ jb
Þ c þ jd
ð
Þ
¼ acbd
ð
Þ þjadþ bc
ð
Þ
¼ dab
ð
Þþacd
ð
Þþcaþ b
ð
ð
Þ aðcdÞ
Þj
There are a variety of algorithmic design options cited in the literature that reduce the number of
complex multiplications while implementing (8.6). These options can be broadly divided into two
categories: butterfly-based computation and mathematical transformation.
For butterfly-based computation a variety of algorithms based on radix-2, radix-4, radix-8,
hybrid radix, radix-2 2 , radix-2 3 , and radix-2/2 2 /2 3 [9, 10] and a mix of these are proposed. The
radix-4 and radix-8 algorithms use fewer complex multiplications than radix-2 but require N to be
a power of 4 and 8, respectively. To counter the complexity and still gain the benefits, radix-2 2 and
radix-2 3 algorithms are proposed in [11] that still use radix-2 type butterflies and reduce the
number of complex multiplications. Most of these radix-based algorithms reduce the number of
complex multiplications. In general, a radix-r algorithm requires log r N stages of N/r butterflies,
and preferably N needs to be a power of r or the data is zero-padded to make the signal of requisite
length.
A radix-2 N-point FFT divides the computation into log 2 N stages where each stage executes N/2
two-point butterflies. A flow graph for 8-point FFT implementation using a radix-2 butterfly is
shown in Figure 8.13(a). An 8-point FFT can also be computed by using two stages of radix-4
butterflies,shown in Figure 8.13(b); each stage would contain two of these butterflies.
A pipelined FDA requires the data to be placed at the input of every butterfly in every cycle, and
then the design works in lock step to generate N output in every clock cycle with an initial latency of
the number of pipeline stages in the design. This design requires placing of N samples at the input of
the architecture in every clock cycle. This for large values of N is usually not possible, so then the
designer needs to resort to folded architecture.
When there aremore clock cycles available for the design, the architecture is appropriately folded.
In general, a folded architecture can realize M butterflies in parallel to implement an N-point FFT,
Search WWH ::




Custom Search