Hardware Reference
In-Depth Information
Thread No.
(FFT Stage)
ALU,MLT
XB
LS
L1
L2
L1
L2
1
4
7
10
A1
X1
A1
X2
A1
X1
A1
X2
-
W 8
W 64
W 512
L2
L1
L2
2
5
8
A1
X2
A1
X1
A1
X2
W 2
W 16
W 128
L1
L2
L1
3
6
9
A1
X1
A1
X2
A1
X1
W 4
W 32
W 256
A1 : Mapping for ALU and MLT cells (1 type)
X1,X2 : Mapping for the crossbar (2 types)
L1,L2 : Mapping for LS cells (2 types)
Fig. 3.60
Thread de fi nition and sequence for 1,024-point FFT
Table 3.14 Evaluated cycles of 1,024- and 2,048-point FFT on FE-GA
Breakdown of the cycles
Number of FFT
points
Number of total
cycles
Load/store
init. delay
Con fi g.
preloading
Operation
1,024
2,747
2,560
80
107
2,048
5,838
5,632
88
118
different banks, and the location in the banks differs at each FFT stage. Therefore,
the total number of threads is the same as that of the FFT stages as illustrated in
Fig. 3.60 .
The performance of 1,024-point and 2,048-point FFT on FE-GA was evaluated.
This process involved placing all the data including input data and twiddle factors
placed in the local memory and storing the configurations in the configuration buf-
fer. Therefore, the evaluated cycles of execution include operations, data load and
store from/to the local memory, thread switching, and configuration preloading to
the operation cells. Note that the cycles exclude a bit-reversing process. Table 3.14
gives the evaluation results. The operations account for most of the total cycles, and
there are relatively few overhead cycles consisting of initial delay (data and
con fi guration load) and thread switching.
3.3
Matrix Engine (MX)
As a special-purpose processor core which is suitable for arithmetic-intensive appli-
cations like image and signal processing, Matrix Engine (MX) core which has a
massively parallel SIMD architecture is developed. There are two versions, MX-1
and MX-2, of the MX core, and they are described in the following subsections.
 
Search WWH ::




Custom Search