Iterative Optimization - A Rapid Introduction to Adaptive Filtering

Digital Signal Processing Reference

In-Depth Information

− μ opt λ min =− (

− μ opt λ max ),

(3.16)

which leads to the solution

λ max + λ min .

μ opt =

(3.17)

Although the two slowest modes will have the same speed, the mode 1

− μ opt λ min will

be positive, leading to an overdamped convergence where the trajectory to the mini-

mum follows a continuous path. On the other hand, the negative mode 1

− μ opt λ max

leads to an underdamped convergence where the trajectory exhibits oscillations. If

we use the ratio between the largest and smallest eigenvalues, which is known as

the condition number (or eigenvalue spread )of R x [ 2 ] and is denoted

χ(

R x )

,the

magnitude of the slowest modes can be put as

α = χ(

R x ) −

1 .

(3.18)

χ(

R x ) +

When

λ max

= λ min ,

α =

0 so it takes one iteration to converge. As the condition

number increases, so does

, becoming close to 1 for large condition numbers, which

corresponds to a slow convergence mode. Therefore,

χ(

R x )

plays a critical role in

limiting the convergence speed of the SD algorithm.

In practice, it is usual to choose

1. Although this

leads to a slower overdamped convergence, it might mitigate the effects that appear

when the error surface is not fully known and needs to be measured or estimated

from the available data (as it will be the case with stochastic gradient algorithms that

will be studied in Chap. 4 ) . Under this condition, the fastest mode is associated to the

largest eigenvalue, while the slowest mode is associated to the smallest eigenvalue.

Actually, since the step size bound is related to

in such a way that

μλ i

λ max , the slowest mode associated

1. Then, the larger the condition

number, the slower the convergence of the algorithm. In any case, as long as the

condition ( 3.13 ) is fulfilled, the SD algorithm applied to the error surface J MSE (

λ min can be put as 1

−

/χ(

R x )

, with 0

)

will reach the global minimum (where the MSE takes the value J MMSE ) regardless

of the chosen initial condition .

Finally, the step size can be turned into a time-varying sequence to improve the

algorithm performance, e.g., choosing

. However, the

stability analysis needs to be revised. One possible sufficient condition for conver-

gence would require the sequence

μ(

)

to minimize J MSE (

(

))

{ μ i }

to be square summable but not summable [ 3 ],

e.g.,

μ i

)

3.3 Newton-Raphson Method

Consider again the update ( 3.2 ). We showed that for any positive definite matrix A ,

this recursion will converge to a minimum of the cost function. If we assume that

the Hessian matrix of the cost function is positive definite, we can choose its inverse

A Rapid Introduction to Adaptive Filtering

Search WWH ::

Custom Search

Home