Information Technology Reference
In-Depth Information
for n odd: fr n ( x )= xP ( n− 3) / 2 ( x ), with, for n
6: P ( n
3) / 2( x )=
1+ ( n− 3) / 2
k =1
[1 / (2 k )((2 k
x ) k ]; for n =3: P 0 ( x ) = 1; for n
1)!!) / ( k !)(1
=1: P 1 ( x )=0.
2.10.5 Optimization Methods: Levenberg-Marquardt and BFGS
That presentation is from [Oussar 1998]. The algorithms are also described in
[Press et al. 1992].
2.10.5.1 The BFGS Algorithm
The BFGS algorithm consists in updating the parameters, at iteration i ,by:
w ( i )= w ( i
1)) where µ i is positive, and where M i is
an approximation, computed iteratively, of the inverse of the Hessian matrix;
the latter is computed, at each iteration, by
M i = M i− 1 + 1+ γ T
1)
µ i M i J ( w ( i
δ T
δ i− 1 T
i− 1 M i− 1 + M i− 1 γ i− 1 T
i− 1 M i− 1 γ i− 1
δ T
i−
i− 1 δ i− 1
δ T
i−
i− 1
1 γ i− 1
,
δ T
i−
1 γ i− 1
1 γ i− 1
where γ i− 1 =
1). The
initial value M 0 is the identity matrix. If, at some iteration, the matrix is not
found to be definite positive, it is re-initialized to the identity matrix.
That approximation is valid only in the neighborhood of a minimum of the
cost function. Therefore, it is recommended to use simple gradient descent (or
stochastic gradient descent) at the beginning of training, in order to get close
to a minimum, then switch to BFGS when the minimum is close enough.
J ( w ( i ))
−∇
J ( w ( i
1)) and δ i− 1 = w ( i )
w ( i
2.10.5.2 The Levenberg-Marquardt Algorithm
The Levenberg-Marquardt algorithm consists in updating the parameters, at
iteration i ,by
1)) + µ i I ] 1
w ( i )= w ( i
1)
[ H ( w ( i
( w ( i
1)) ,
where µ i is positive. For small values of the step µ i , the Levenberg-Marquardt
algorithm is close to the Newton method, whereas, for large values of µ i ,the
Levenberg-Marquardt algorithm is equivalent to simple gradient descent with
step 1/ µ i .
The application of the algorithm requires the inversion of matrix [ H ( w ( i
1))+ µ i I ]. The exact expression of the Hessian matrix of the total cost function
J ( w )is
∂e k
w ( i )
∂e k
w ( i )
T +
N
N
2 e k
w ( i )( w ( i )) T e k ,
H ( w ( i )) =
k =1
k =1
with e k = y p
y k .
 
Search WWH ::




Custom Search