Modeling Methodology: Dimension Reduction and Resampling Methods - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

Tri-Median

The tri-median corresponds to 0 . 25 Q 1 (1st quartile) + 0 . 5Q 2 (2nd quartile

or median) + 0 . 25 Q 3 (3rd quartile).

After determining the optimum number of cycles by one of the strategies,

NeMo starts a new training cycle based on all examples, with the optimized

number of cycles N optima c defined during the previous phase. For that last

training cycle, the same training parameters (initial value and variation law

of the gradient step) are used. If ε a denotes the average error computed on

the initial base, and δ the average value of the bias, the generalization error

is estimated by

ε g = ε a + δ.

More generally, the distribution function of the generalization error is esti-

mated by the empirical distribution function of the shifted bias of the value

ε a . Note the contribution of the bootstrap associated with early stopping in

relation to cross-validation,

•

to some extent, the automation of the design of the network by adapting

the number of early stopping cycles,

•

a wider estimate of the variability of the model with respect to the data

set,

•

estimates the confidence intervals (margins, uncertainty),

•

the use of all examples to construct the network.

Finally, it should be noted that NeMo may monitor the suitability of the

model to the data: if the optimized number of cycles is too close to user-chosen

maximum number of cycles, there is no minimum test error. In that case, the

user must increase the complexity of the network (number of hidden neurons)

or increase the number of training cycles.

3.6.5 Testing the NeMo Method

In the following, we describe the results of an experiment designed to validate

the method. The test consists in comparing the average error estimated by

NeMo to the actual error. The actual error is approximated according to the

Monte Carlo method, i.e., by making a very large number of computations of

the average quadratic error, then by computing its average. We used NeMo

for the approximation of two nonlinear analytical functions,

8

•

φ 8 ( x )

→ R

,

R

12

•

φ 12 ( x )

→ R

.

R

We chose those functions in order to evaluate the method on the approx-

imation of su ciently complex functions (large dimensions of input space).

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home