Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

for n odd: fr n ( x )= √ xP ( n− 3) / 2 ( x ), with, for n

•

≥

6: P ( n

−

3) / 2( x )=

1+ ( n− 3) / 2

k =1

[1 / (2 k )((2 k

x ) k ]; for n =3: P 0 ( x ) = 1; for n

−

1)!!) / ( k !)(1

−

=1: P − 1 ( x )=0.

2.10.5 Optimization Methods: Levenberg-Marquardt and BFGS

That presentation is from [Oussar 1998]. The algorithms are also described in

[Press et al. 1992].

2.10.5.1 The BFGS Algorithm

The BFGS algorithm consists in updating the parameters, at iteration i ,by:

w ( i )= w ( i

1)) where µ i is positive, and where M i is

an approximation, computed iteratively, of the inverse of the Hessian matrix;

the latter is computed, at each iteration, by

M i = M i− 1 + 1+ γ T

−

µ i M i ∇ J ( w ( i

−

δ T

δ i− 1 T

i− 1 M i− 1 + M i− 1 γ i− 1 T

i− 1 M i− 1 γ i− 1

δ T

i−

i− 1 δ i− 1

δ T

i−

i− 1

1 γ i− 1 −

δ T

i−

1 γ i− 1

where γ i− 1 =

1). The

initial value M 0 is the identity matrix. If, at some iteration, the matrix is not

found to be definite positive, it is re-initialized to the identity matrix.

That approximation is valid only in the neighborhood of a minimum of the

cost function. Therefore, it is recommended to use simple gradient descent (or

stochastic gradient descent) at the beginning of training, in order to get close

to a minimum, then switch to BFGS when the minimum is close enough.

∇

J ( w ( i ))

−∇

J ( w ( i

−

1)) and δ i− 1 = w ( i )

−

w ( i

−

2.10.5.2 The Levenberg-Marquardt Algorithm

The Levenberg-Marquardt algorithm consists in updating the parameters, at

iteration i ,by

1)) + µ i I ] − 1

w ( i )= w ( i

−

[ H ( w ( i

−

∇

( w ( i

−

1)) ,

where µ i is positive. For small values of the step µ i , the Levenberg-Marquardt

algorithm is close to the Newton method, whereas, for large values of µ i ,the

Levenberg-Marquardt algorithm is equivalent to simple gradient descent with

step 1/ µ i .

The application of the algorithm requires the inversion of matrix [ H ( w ( i

−

1))+ µ i I ]. The exact expression of the Hessian matrix of the total cost function

J ( w )is

∂e k

∂ w ( i )

∂e k

∂ w ( i )

T +

∂ 2 e k

∂ w ( i )( ∂ w ( i )) T e k ,

H ( w ( i )) =

k =1

with e k = y p −

y k .

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home