Digital Signal Processing Reference
In-Depth Information
1
μ opt λ min =− (
1
μ opt λ max ),
(3.16)
which leads to the solution
2
λ max + λ min .
μ opt =
(3.17)
Although the two slowest modes will have the same speed, the mode 1
μ opt λ min will
be positive, leading to an overdamped convergence where the trajectory to the mini-
mum follows a continuous path. On the other hand, the negative mode 1
μ opt λ max
leads to an underdamped convergence where the trajectory exhibits oscillations. If
we use the ratio between the largest and smallest eigenvalues, which is known as
the condition number (or eigenvalue spread )of R x [ 2 ] and is denoted
χ(
R x )
,the
magnitude of the slowest modes can be put as
α = χ(
R x )
1
1 .
(3.18)
χ(
R x ) +
When
λ max
= λ min ,
α =
0 so it takes one iteration to converge. As the condition
number increases, so does
, becoming close to 1 for large condition numbers, which
corresponds to a slow convergence mode. Therefore,
α
χ(
R x )
plays a critical role in
limiting the convergence speed of the SD algorithm.
In practice, it is usual to choose
1. Although this
leads to a slower overdamped convergence, it might mitigate the effects that appear
when the error surface is not fully known and needs to be measured or estimated
from the available data (as it will be the case with stochastic gradient algorithms that
will be studied in Chap. 4 ) . Under this condition, the fastest mode is associated to the
largest eigenvalue, while the slowest mode is associated to the smallest eigenvalue.
Actually, since the step size bound is related to
μ
in such a way that
μλ i
λ max , the slowest mode associated
to
1. Then, the larger the condition
number, the slower the convergence of the algorithm. In any case, as long as the
condition ( 3.13 ) is fulfilled, the SD algorithm applied to the error surface J MSE (
λ min can be put as 1
a
/χ(
R x )
, with 0
<
a
<
)
will reach the global minimum (where the MSE takes the value J MMSE ) regardless
of the chosen initial condition .
Finally, the step size can be turned into a time-varying sequence to improve the
algorithm performance, e.g., choosing
w
. However, the
stability analysis needs to be revised. One possible sufficient condition for conver-
gence would require the sequence
μ(
n
)
to minimize J MSE (
w
(
n
))
{ μ i }
to be square summable but not summable [ 3 ],
e.g.,
μ i
=
1
/(
i
+
1
)
.
3.3 Newton-Raphson Method
Consider again the update ( 3.2 ). We showed that for any positive definite matrix A ,
this recursion will converge to a minimum of the cost function. If we assume that
the Hessian matrix of the cost function is positive definite, we can choose its inverse
 
Search WWH ::




Custom Search