Digital Signal Processing Reference
In-Depth Information
is called the Newton direction if
H
[
w
(
n
)] is nonsingular. Newton method converges
quadratically to a local optimum if
w
(0) is sufficiently close to this point and if the
Hessian is positive definite. However, the method faces difficulties when the quadratic
approximation is not a reasonable one at the current weight update and
/
or the Hessian
is not positive definite. Thus a number of modifications have been proposed to the
Newton method, such as performing a line search along the Newton direction,
rather than using the stepsize that minimizes the quadratic model assumption. More
importantly, a number of procedures are introduced that use an approximate
Hessian rather than the actual Hessian that allow better numerical properties. These
include the Davidon-Fletcher-Powell (DFP) method and the Broyden-Fletcher-
Goldfarb-Shanno (BFGS) method [82].
Another approach is to solve (1.22) iteratively, which is desirable also when
the dimensionality of the problem is high and
/
or the numerical properties of the
Hessian are known to be poor. For the task, we can employ the well known conjugate
gradient algorithm, which generates a sequence
d
(1),
d
(2),
...
,
d
(
k
) such that
d
(
k
)
converges to the optimal direction
2
(
H
[
w
(
n
)])
2
1
r
w
(
n
)
f
.
A set of nonzero vectors [
c
(0),
c
(1),
...
,
c
(
n
)] are said to be conjugate with respect
to a symmetric positive definite matrix
A
if
c
T
(
k
)
Ac
(
l
)
¼
0,
for all
k
=
l
where, in this case
A ¼ H
[
w
(
n
)].
It can be shown that for any
d
(0)
[ R
N
, the sequence
d
(
k
) generated by the conju-
gate direction algorithm as
d
(
k þ
1)
¼ d
(
k
)
þa
k
c
(
k
)
q
T
(
k
)
c
(
k
)
c
T
(
k
)
H
[
w
(
n
)]
c
(
k
)
q
(
k
)
¼r
w
(
n
)
f þH
[
w
(
n
)]
d
(
k
)
a
(
k
)
¼
converges to the optimal solution at most
N
steps. The question that remains is how
to construct the set of conjugate directions. Generally
c
(
k
) is selected to be a linear
combination of
q
(
k
) and the previous direction
c
(
k
2
1) as
c
(
k
)
¼q
(
k
)
þb
(
k
)
c
(
k
1)
where
q
T
(
k
)
H
[
w
(
n
)]
c
(
k
1)
c
T
(
k
1)
H
[
w
(
n
)]
c
(
k
1)
b
(
k
)
¼
is determined by the constraint that
c
(
k
) and
c
(
k
2
1) must be conjugate to the
Hessian matrix.
Search WWH ::
Custom Search