Hardware Reference
In-Depth Information
Then the object is generated using the assembler and the linker. Since the FE-GA
is managed by CPUs, users need to create FE-GA control codes and attach them
to the reference program. Finally, the combined program for CPUs and FE-GA is
debugged on the integrated CPU and FE-GA simulator or on a real chip.
3.2.7
Implementation of Fast Fourier Transform on FE-GA
Fast Fourier transform (FFT), which is a common algorithm used in media process-
ing, was implemented on the FE-GA for evaluation. In this subsection, details of
the implementation are described. The algorithm used for mapping and evaluation
was a radix-2 decimation-in-time FFT, which is the most common form of the
Cooley-Tukey algorithm [ 53, 54 ]. We used this algorithm because the radix-2 FFT
is simple, and the decimation-in-time FFT has a multiplication of data and a twiddle
factor in the first part of calculation before fixed-point processing. It avoids having
to use wiring in order for supplying a twiddle factor into the middle of the cell array;
therefore, it preserves the resources of the cells for the fixed-point processing. The
format of input and output data is 16-bit fixed point (Q-15 format).
The FFT is calculated by repeating the butterfly calculation as follows (a decima-
tion-in-time algorithm):
a x yWb x yW
=+×
,
=−×
,
where a , b , x , and y are imaginary data and W is a twiddle factor. The equation can
be divided into a real part and an imaginary part as follows:
ar
=+× −×
xr
yr
Wr
yi
Wi ai
,
=+×+×
xi
yr
Wi
yi
Wr
,
br
=−× +×
xr
yr
Wr
yi
Wi bi
,
=−×−×
xi
yr
Wi
yi
Wr
.
Figure 3.57a shows a data flow graph of the above equations. The circled “×,”
“<<,” “+,” and “−,” respectively, indicate multiplication, 1-bit left shift, addition,
and subtraction. Because the data are 16-bit fixed point, the upper 16 bits of the
multiplied data with W should be left shifted. Figure 3.57b depicts a mapping of
the data flow graph on the 4 × 4 cells (the upper half part of the operation cell array).
A rectangle in each cell indicates an operation, and an arrow between rectangles
shows 16-bit data (dashed arrow is 1-bit data). In the rectangles, “D” inserts 1-cycle
delay. “ROTL” shifts 1 bit to the left and inserts an input 1-bit data into the LSB.
“MSB” outputs the MSB as 1-bit data, which is realized by an “add with carry”
operation. “~MSB” outputs the complement of the MSB as 1-bit data. Note that
the MLT cells, normally set in the second row of the cell array, are set in the first row
in order to map the FFT. After multiplications at the first row, the MSB of the lower
16-bit data is extracted, and the upper 16-bit datum with 1-cycle delay applied
is shifted 1 bit to the left with the MSB of the lower data attached to the LSB.
Search WWH ::




Custom Search