Iterative Optimization - A Rapid Introduction to Adaptive Filtering - page 30

Digital Signal Processing Reference

In-Depth Information

(a)

(b)

(c)

(R) = 10 , = 0. 1

(R) = 1 0 , = 1. 5

( R) = 10, = 2.05

3

3

4

2.5

2.5

3

2

2

2

1.5

1.5

1

1

1

0

0.5

0.5

−1

0

0

−1

0

1

−1

0

1

−2

−1

0

1

2

w 1 (n)

w 1 (n)

w 1 (n)

0

0

6

(R) = 1

(R) = 2

(R) = 10

(R) = 1

(R) = 2

(R) = 10

5

−50

−50

4

−100

−100

3

2

−150

−150

1

−200

−200

0

0

50

100

150

200

0

10

20

30

0

5

10

Iteration number

Iteration number

Iteration number

Fig. 3.5 Same as in Fig. 3.3 but for the Newton-Raphson algorithm

3.5 Further Comments

In [ 4 ], there is an interesting interpretation of the NR method that clarifies further

why its convergence is independent of the eigenvalue spread. The result is that the

NR algorithm works as an SD algorithm using an input signal generated by applying

the Karhunen-Loéve transform (which decorrelates the input signal) and a power

normalization procedure, which is known as a whitening process .

The SDmethod presents a very slow convergence rate in the vicinity of the optimal

solution, which is overcome by the NR method. But the latter does not take much

advantage of the high gradients at points far away from the minimum, as the SD

method does. To improve this tradeoff, the Levenberg-Marquardt (LM) algorithm

[ 5 ], comes as a combination of the SD and NR algorithms, trying to share the merits

of both methods. It basically includes a time-dependent regularization constant

)

to the NR algorithm, and instead of multiplying it by the identity matrix I L it uses

the diagonal of the Hessian matrix. This last change allows each component of the

gradient to be scaled independently to provide larger movement along the directions

where the gradient is smaller. At the beginning of the iteration process, a large

β(

n

)

is used, making the algorithm to move closer to the direction given by the gradient,

i.e., operating as an SD algorithm. As we get closer to the minimum,

β(

n

can be

decreased, bringing the algorithm closer to the NR method. In addition the use of a

regularization provides numerical stability to the NR method as mentioned earlier.

The attempts to reduce the computational cost of the NR method led to the family

quasi-Newton methods [ 6 ]. The idea behind them is to approximate the Hessian

matrix or its inverse in terms of the gradient. Some examples include the Broyden-

Fletcher-Goldfarb-Shanno method (which as the LM method performs initially as

β(

n

)

Next Page

A Rapid Introduction to Adaptive Filtering

Search WWH ::

Custom Search

Home