Digital Signal Processing Reference
In-Depth Information
access conflicts. The next stage swaps the storage of results for evry two sets of outputs in memories
RAM 0 and RAM 1 . The cycle-by-cycle storage of results is shown in Figure 8.13(b).
8.5.5 Systolic Folded Architecture
The FFTalgorithmcan also be realized as systolic folded architecture, using anMDFor SDF basis. In
this architecture all the butterflies in one stage of the FFT flow graph are folded and mapped on a
single butterfly processing element. To ensure systolic operation, intermediate registers and multi-
plexers are placed such that the design keeps processing data for computation of FFT and correct
inputs are always available to every PE in every clock cycle for full utilization of the hardware.
Implementing the decimation in a frequency FFTalgorithm, the architecture requires log 2 N PEs.
To enable fully systolic operationwith 100%utilization in theMDF configuration, the two input data
streams are provided to PE 1 . The upper streamserially feeds the first N/2 samples of input data, x 0 , x 1 ,
... , x N/2-1 , and the lower stream serially inputs the other N/2 samples, x N/2 , x N/2 þ 1 , ... , x N 1. Then
PE 1 sequentially computes one radix-2 butterfly of decimation in the frequency FFT algorithm in
every clock cycle and passes the output to the next stage of folded architecture. To synchronize the
input to PE 2 , the design places a set of N/4 registers each in upper and lower branches. The
multiplexers ensure the correct set of data are fed to PE 1 and PE 2 in every clock cycle. Similarly each
stage requires placement of N/2 s registers at each input for s
, log 2 N.
Example: Figure 8.14 shows the details of the design for a systolic FFT implementation. The data
is serially fed to PE 1 and the output is stored in registers for first two computations of butterfly
operations. These outputs are then used for the next two set of computation.
The cycle-by-cycle detailed working of the architecture is shown in Figure 8.16. The first column
shows the cycles where the design repeats computation after every fourth cycle and starts computing
the 8-point FFTof a new sequence of 8 data points. The second column shows the data being serially
fed to the design. The first cycle takes x 0 and x 4 , and then in each subsequent cycle two data values are
fed as shown in this column. The columns labeled as PE 1 , PE 2 and PE 3 show the butterfly
computations. The intermediate values stored in registers R 0 , R 1 , R 2 and R 3 are also shown in the
table as y ij , where i is the level and j is the index..
The multiplexer selects signals given in Table 8.1. The controller is a simple 2-bit counter, where
the most significant bit (MSB) of the counter is used for sel 1 and the least significant bit (LSB) is
used for sel 2 . The counter is used to read the values of twiddle factors from four-deep ROM. The
RTL Verilog code is given here:
¼
2, 3,
...
// Systolic FFT architecture for 8-point FFT computation
module FFTSystolic
(
input signed [15:0] xi_re, xi_im, xj_re, xj_im,
input clk, rst_n,
output signed [15:0] yi_re, yi_im, yj_re, yj_im);
reg signed [15:0] R_re[0:5], R_im[0:5];
reg [1:0] counter;
wire signed [15:0] W_re[0:3], W_im[0:3];
// Twiddle factors of three butterflies
reg signed [15:0] W0_re, W0_im, W1_re, W1_im, W2_re, W2_im;
wire sel1, sel2;
wire signed [15:0] xj_re1, xj_im1, xj_re2, xj_im2;
wire signed [15:0] yi_re1, yi_im1, yj_re1, yj_im1,
yi_re2, yi_im2, yj_re2, yj_im2;
Search WWH ::




Custom Search