Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

errors. Otherwise, the training algorithm, or the program (or both) should be

checked for errors.

The structure of the student network is identical to that of the teacher

network within permutations of the hidden neurons. This is a consequence of

the unicity theorem [Sontag 1993].

Two Test Problems

Problem 1: A network with 8 inputs, 6 hidden neurons and one output is

generated by drawing weights uniformly in the interval [ − 20, +20]; a training

set and a test set of 1,500 examples each are generated with random inputs

from a uniform distribution in [

1 , +1]; a network having the same structure is

trained as follows: initialization of the parameters from a uniform distribution

in [

−

0 . 6 , +0 . 6], computation of the gradient by backpropagation, minimiza-

tion of the cost function by the Levenberg-Marquardt algorithm. The teacher

network is retrieved exactly (TMSE and VMSE on the order of 10 − 31 )in

96% of trainings (for 48 trainings out of 50 trainings performed with different

initializations).

Problem 2: A network with 10 inputs, 5 hidden neurons and an output is

generated with weights drawn uniformly in [

−

1 , +1]; a training set and a test

set are generated with random inputs from a normal distribution; training is

performed as in the previous example; the teacher network is retrieved in 96%

of the trainings if the training set has 400 examples; it is retrieved in 100% of

the trainings if the training set has 2,000 examples.

For the same problems, training always fails to retrieve the teacher network

if simple gradient descent or stochastic gradient (see next section) are used,

with or without momentum term.

Note that the teacher-student problem becomes di cult for some archi-

tectures because of a large number of local minima.

−

2.5.2.4 Summary

We summarize the procedure that must be used for training a feedforward

neural network with a given number of inputs and hidden neurons:

•

Initialize the parameters with the method described above.

•

Compute the gradient of the cost function by backpropagation.

•

Update the parameters iteratively with an appropriate minimization algo-

rithm (simple gradient descent, BFGS, Levenberg-Marquardt, conjugate

gradient, etc.).

•

If a prescribed maximum number of epochs is reached, or if the variation

of the module of the vector of parameters is smaller than a given thresh-

old (the weights no longer change significantly), or if the module of the

gradient is smaller than a given threshold (a minimum has been reached),

terminate the procedure; otherwise, start a new epoch by iterating to the

gradient evaluation.

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home