Digital Signal Processing Reference
In-Depth Information
is called the Newton direction if H [ w ( n )] is nonsingular. Newton method converges
quadratically to a local optimum if w (0) is sufficiently close to this point and if the
Hessian is positive definite. However, the method faces difficulties when the quadratic
approximation is not a reasonable one at the current weight update and / or the Hessian
is not positive definite. Thus a number of modifications have been proposed to the
Newton method, such as performing a line search along the Newton direction,
rather than using the stepsize that minimizes the quadratic model assumption. More
importantly, a number of procedures are introduced that use an approximate
Hessian rather than the actual Hessian that allow better numerical properties. These
include the Davidon-Fletcher-Powell (DFP) method and the Broyden-Fletcher-
Goldfarb-Shanno (BFGS) method [82].
Another approach is to solve (1.22) iteratively, which is desirable also when
the dimensionality of the problem is high and / or the numerical properties of the
Hessian are known to be poor. For the task, we can employ the well known conjugate
gradient algorithm, which generates a sequence d (1), d (2), ... , d ( k ) such that d ( k )
converges to the optimal direction 2 ( H [ w ( n )]) 2 1
r w ( n ) f .
A set of nonzero vectors [ c (0), c (1), ... , c ( n )] are said to be conjugate with respect
to a symmetric positive definite matrix A if
c T ( k ) Ac ( l ) ¼ 0,
for all k = l
where, in this case A ¼ H [ w ( n )].
It can be shown that for any d (0) [ R
N , the sequence d ( k ) generated by the conju-
gate direction algorithm as
d ( k þ 1) ¼ d ( k ) þa k c ( k )
q T ( k ) c ( k )
c T ( k ) H [ w ( n )] c ( k )
q ( k ) ¼r w ( n ) f þH [ w ( n )] d ( k )
a ( k ) ¼
converges to the optimal solution at most N steps. The question that remains is how
to construct the set of conjugate directions. Generally c ( k ) is selected to be a linear
combination of q ( k ) and the previous direction c ( k 2 1) as
c ( k ) ¼q ( k ) þb ( k ) c ( k 1)
where
q T ( k ) H [ w ( n )] c ( k 1)
c T ( k 1) H [ w ( n )] c ( k 1)
b ( k ) ¼
is determined by the constraint that c ( k ) and c ( k 2 1) must be conjugate to the
Hessian matrix.
 
Search WWH ::




Custom Search