Image Processing Reference
In-Depth Information
A
4
=x
3
-x
4
C
4
= B
4
B
4
=-A
4
-A
5
D
4
=(S
2
-S
6
)C
4
E
4
=-D
64
-D
4
F
4
= E
4
S
5
X
5
= F
7
+F
4
A
5
=x
2
-x
5
B
5
=A
5
+A
6
C
5
= B
5
D
5
=S
4
C
5
E
5
=D
5
F
5
=E
5
+E
7
S
1
X
1
= F
5
+F
6
A
6
=x
1
-x
6
C
6
= B
6
B
6
=A
6
+A
7
D
6
=(S
2
+S
6
)C
6
E
6
=D
6
-D
64
F
6
= E
6
S
7
X
7
= F
5
-F
6
A
7
=x
0
-x
7
B
7
= A
7
C
7
= B
7
E
7
=D
7
S
3
X
3
= F
7
-F
4
D
7
= C
7
F
7
=E
7
-E
5
D
64
= S
6
(C
6
+C
4
)
Fig. 5. The AAN algorithm limited to indices4-7only with a time-oriented structure.
Adders, sub-tractors, multipliers and shift registers are marked by the following colours:
blue, gray, black and green, respectively. Red colour corresponds to routines requiring a
cascade processes.
A direct implementation of the pure AAN algorithm requires 7 pipeline stages, which utilize
additional resources of shift registers for synchronization for operations like: X(t+1) = X(t). In
a numerical calculation in processors data are simply waiting for a next performance cycle.
The
D
64
block contains a cascade of the sum and the multiplication. An implementation of
the cascade in a single clock FPGA logic block significantly reduce a speed. Additionally,
the
lpm_add_sub
mega-function from the Altera
®
library of parameterized modules (LPM)
does not support an inversion of a sum i.e.
B
4
=
−
(
.
These operations would have to be performed in a cascade way by an adder and a sign
inversion. Cascade operations performed in the same clock cycle significantly slow down
a global registered performance.
A
4
+
A
5
)
or
E
4
=
−
(
D
64
+
D
4
)
A
4
=x
3
-x
4
C
4
= B
4
B
4
=A
4
+A
5
D
4
=(S
2
-S
6
)C
4
E
4
=D
4
-D
64
S
5
X
5
= E
7
+E
4
A
5
=x
2
-x
5
B
5
=A
5
+A
6
C
5
= B
5
D
5
= S
4
C
5
E
5
=D
7
+D
5
S
1
X
1
= E
5
+E
6
A
6
=x
1
-x
6
C
6
= B
6
B
6
=A
6
+A
7
D
6
=(S
2
+S
6
)C
6
E
6
=D
6
-D
64
S
7
X
7
= E
5
-E
6
A
7
=x
0
-x
7
B
7
= A
7
C
7
= B
7
E
7
=D
7
-D
5
S
3
X
3
= E
7
-E
4
D
7
= C
7
C
64
=B
6
-B
4
D
64
= S
6
C
64
Fig. 6. Optimized AAN algorithm for indices4-7.Aredefinition and splitting of variables
allowed a reduction of the chain length.
A simple redefinition of nodes removes difficulties mentioned above. The
B
4
node defined
as the sum of
A
4,5
nodes requires a simple
lpm_add_sub
mega-function. The
D
4
node with
currently inverted sign allows using
lpm_add_sub
in
E
4
performing a subtraction. The
D
64
node from Fig. 5 can be split into the subtraction
C
64
and the multiplication
D
64
in the next
clock cycle (Fig. 6).