Digital Signal Processing Reference
In-Depth Information
as the matrix A , i.e.,
1
2
A
=
w J MSE (
w
(
n
1
))
.
(3.19)
(
)
Then, the new recursion — starting with an initial guess w
1
—is
1
2
w
(
n
) =
w
(
n
1
) μ
w J MSE (
w
(
n
1
))
w J MSE (
w
(
n
1
)).
(3.20)
This is known as the Newton-Raphson (NR) method since it is related to the method
for finding the zeros of a function. To understand this relation, consider the univariate
casewherewewant to find the zeros of a function f
0.
Starting with an initial guess w 0 , define the next estimate w 1 as the point where the
tangent at f
(
w
)
, i.e., the solution to f
(
w
) =
intersects the w -axis. Then, as the derivative f (
(
w 0 )
w 0 )
is the slope of
the tangent at w 0 , it holds
w 0 )
f (
f
(
w 1 =
w 0
w 0 ) .
(3.21)
This will be applied recursively until we reach a zero of the function (so the update
becomes null). Now, if we consider the function f
J MSE (
(
w
) =
w
)
, when we find
its zero we will find the minimum of J MSE (
. Extending the previous idea to the
multidimensional case will lead to the NR recursion ( 3.20 ), where the step size
w
)
μ
is
included as before to control the magnitude of the update.
In applying the NR method to J MSE (
w
)
, notice that from ( 3.6 ) the Hessian is
given by R x ,so( 3.20 ) takes the form
[ r x d R x w ( n 1 ) ] = w ( n 1 ) + μ w opt w ( n 1 ) .
w ( n ) = w ( n 1 ) + μ R 1
x
(3.22)
The SD method performs the update in the direction of the gradient, so it is normal
to the elliptic contour curves of the error surface. However, ( 3.22 ) shows that the
NR method moves directly in the direction pointing towards the minimum. So the
gradient is rotated as it is multiplied by R x . To the gradient to be pointing directly
towards the minimum, the point where the gradient is being evaluated must lie on the
direction of one of the eigenvectors of R x , or the contour curves of the error surface
need to be circular (so R x
I L ).
1 the minimum is reached in one iteration of the NR
algorithm regardless of the the initial guess. As true as this is, when the NRmethod is
used in practice, errors in computation/estimation of the gradient and/or Hessian are
very likely to arise. The use of
Moreover, if we choose
μ =
1 will give better control over the convergence
of the algorithm. In any case, as it will be useful in the study of adaptive filters in the
next chapter, it is important to see how this algorithm behave with a stable
μ <
μ
different
from 1.
Search WWH ::




Custom Search