Iterative Optimization - A Rapid Introduction to Adaptive Filtering

Digital Signal Processing Reference

In-Depth Information

as the matrix A , i.e.,

∇

− 1

w J MSE (

(

−

))

(3.19)

( −

)

Then, the new recursion — starting with an initial guess w

—is

− 1

(

) =

(

−

) − μ

∇

w J MSE (

(

−

))

∇ w J MSE (

(

−

)).

(3.20)

This is known as the Newton-Raphson (NR) method since it is related to the method

for finding the zeros of a function. To understand this relation, consider the univariate

casewherewewant to find the zeros of a function f

Starting with an initial guess w 0 , define the next estimate w 1 as the point where the

tangent at f

(

)

, i.e., the solution to f

(

) =

intersects the w -axis. Then, as the derivative f (

(

w 0 )

is the slope of

the tangent at w 0 , it holds

w 0 )

f (

(

w 1 =

w 0 −

w 0 ) .

(3.21)

This will be applied recursively until we reach a zero of the function (so the update

becomes null). Now, if we consider the function f

J MSE (

(

) =

)

, when we find

its zero we will find the minimum of J MSE (

. Extending the previous idea to the

multidimensional case will lead to the NR recursion ( 3.20 ), where the step size

)

included as before to control the magnitude of the update.

In applying the NR method to J MSE (

)

, notice that from ( 3.6 ) the Hessian is

given by R x ,so( 3.20 ) takes the form

[ r x d − R x w ( n − 1 ) ] = w ( n − 1 ) + μ w opt − w ( n − 1 ) .

w ( n ) = w ( n − 1 ) + μ R − 1

(3.22)

The SD method performs the update in the direction of the gradient, so it is normal

to the elliptic contour curves of the error surface. However, ( 3.22 ) shows that the

NR method moves directly in the direction pointing towards the minimum. So the

gradient is rotated as it is multiplied by R − x . To the gradient to be pointing directly

towards the minimum, the point where the gradient is being evaluated must lie on the

direction of one of the eigenvectors of R x , or the contour curves of the error surface

need to be circular (so R x ∝

I L ).

1 the minimum is reached in one iteration of the NR

algorithm regardless of the the initial guess. As true as this is, when the NRmethod is

used in practice, errors in computation/estimation of the gradient and/or Hessian are

very likely to arise. The use of

Moreover, if we choose

μ =

1 will give better control over the convergence

of the algorithm. In any case, as it will be useful in the study of adaptive filters in the

next chapter, it is important to see how this algorithm behave with a stable

μ <

different

from 1.

A Rapid Introduction to Adaptive Filtering

Search WWH ::

Custom Search

Home