Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

The above relations are valid for a single-output model. The extension to

multiple outputs is straightforward.

Because the second term of the above relation is proportional to the error

e k , it can be neglected in a first approximation, which yields

∂e k

∂ w ( i )

∂e k

∂ w ( i )

∂y k

∂ w ( i )

∂y k

∂ w ( i )

T .

H ( w ( i )) =

k =1

In the case of a model that is linear with respect to its parameters, y is a

linear function of w , so that the second term of the expression of is equal to

zero.

Several techniques can be useful for inverting matrix [ H + µ i I ].

•

Indirect inversion.

The inverse of a matrix can be computed recursively by the following

inversion lemma. If A , B , C and D denote four matrices, one has:

A − 1 B C − 1 + DA − 1 B − 1 DA − 1 .

( A + BCD ) − 1 = A − 1

−

Moreover, with the notation ζ k =( ∂y k ) / ( ∂ w ( i )), matrix H can be con-

structed recursively by defining “partial” matrices H k , of dimension ( k , k ),

H k = H k− 1 + z K ( z k ) T with k =1 ,...,N . One has H = H N as desired.

If the inversion lemma is applied to the previous relation with A =

H , B

= ζ k , C = I ,and D = ζ k T , one gets:

H k− 1 − 1

ζ k ζ k T H k− 1 − 1

1+( ζ k ) T H k− 1 − 1

H k − 1

= H k− 1 − 1

−

ζ k

At the first step ( k := 1), one takes H 0 = µ i I , which gives, at step N :

H N − 1

= ( H )+ µ i I − 1

•

Direct inversion.

Many inversion methods exist. Since the algorithm is iterative, and since

the line search procedure (described below) often requires several matrix in-

versions, an e cient inversion method is mandatory. Since the approximation

of the Hessian matrix µ i I is symmetric definite positive, it is advantageous

to use Cholesky's method [Press et al. 1992].

As for simple gradient descent and for BFGS, µ i must be adjusted at each

iteration. A line search method can be used as discussed in the next section.

Note that the expression of the Hessian of the cost function is specific to the

least squares cost function; in contrast to the BFGS method, the Levenberg-

Marquardt algorithm cannot be used for minimizing arbitrary cost functions,

in particular the cross-entropy cost function, often used for classification.

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home