An Optimization of 16-Point Discrete Cosine Transform Implemented into a FPGA as a Design for a Spectral First Level Surface Detector Trigger in Extensive Air Shower Experiments - Applications of Digital Signal Processing - page 388

Image Processing Reference

In-Depth Information

A 4 =x 3 -x 4

C 4 = B 4

B 4 =-A 4 -A 5

D 4 =(S 2 -S 6 )C 4

E 4 =-D 64 -D 4

F 4 = E 4

S 5 X 5 = F 7 +F 4

A 5 =x 2 -x 5

B 5 =A 5 +A 6

C 5 = B 5

D 5 =S 4 C 5

E 5 =D 5

F 5 =E 5 +E 7

S 1 X 1 = F 5 +F 6

A 6 =x 1 -x 6

C 6 = B 6

B 6 =A 6 +A 7

D 6 =(S 2 +S 6 )C 6

E 6 =D 6 -D 64

F 6 = E 6

S 7 X 7 = F 5 -F 6

A 7 =x 0 -x 7

B 7 = A 7

C 7 = B 7

E 7 =D 7

S 3 X 3 = F 7 -F 4

D 7 = C 7

F 7 =E 7 -E 5

D 64 = S 6 (C 6 +C 4 )

Fig. 5. The AAN algorithm limited to indices4-7only with a time-oriented structure.

Adders, sub-tractors, multipliers and shift registers are marked by the following colours:

blue, gray, black and green, respectively. Red colour corresponds to routines requiring a

cascade processes.

A direct implementation of the pure AAN algorithm requires 7 pipeline stages, which utilize

additional resources of shift registers for synchronization for operations like: X(t+1) = X(t). In

a numerical calculation in processors data are simply waiting for a next performance cycle.

The D 64 block contains a cascade of the sum and the multiplication. An implementation of

the cascade in a single clock FPGA logic block significantly reduce a speed. Additionally,

the lpm_add_sub mega-function from the Altera ® library of parameterized modules (LPM)

does not support an inversion of a sum i.e. B 4 = − (

.

These operations would have to be performed in a cascade way by an adder and a sign

inversion. Cascade operations performed in the same clock cycle significantly slow down

a global registered performance.

A 4 +

A 5 )

or E 4 = − (

D 64 +

D 4 )

A 4 =x 3 -x 4

C 4 = B 4

B 4 =A 4 +A 5

D 4 =(S 2 -S 6 )C 4

E 4 =D 4 -D 64

S 5 X 5 = E 7 +E 4

A 5 =x 2 -x 5

B 5 =A 5 +A 6

C 5 = B 5

D 5 = S 4 C 5

E 5 =D 7 +D 5

S 1 X 1 = E 5 +E 6

A 6 =x 1 -x 6

C 6 = B 6

B 6 =A 6 +A 7

D 6 =(S 2 +S 6 )C 6

E 6 =D 6 -D 64

S 7 X 7 = E 5 -E 6

A 7 =x 0 -x 7

B 7 = A 7

C 7 = B 7

E 7 =D 7 -D 5

S 3 X 3 = E 7 -E 4

D 7

= C 7

C 64 =B 6 -B 4

D 64 = S 6 C 64

Fig. 6. Optimized AAN algorithm for indices4-7.Aredefinition and splitting of variables

allowed a reduction of the chain length.

A simple redefinition of nodes removes difficulties mentioned above. The B 4 node defined

as the sum of A 4,5 nodes requires a simple lpm_add_sub mega-function. The D 4 node with

currently inverted sign allows using lpm_add_sub in E 4 performing a subtraction. The D 64

node from Fig. 5 can be split into the subtraction C 64 and the multiplication D 64 in the next

clock cycle (Fig. 6).

Next Page

Applications of Digital Signal Processing

Search WWH ::

Custom Search

Home