Information Technology Reference
In-Depth Information
Tri-Median
The tri-median corresponds to 0 . 25 Q 1 (1st quartile) + 0 . 5Q 2 (2nd quartile
or median) + 0 . 25 Q 3 (3rd quartile).
After determining the optimum number of cycles by one of the strategies,
NeMo starts a new training cycle based on all examples, with the optimized
number of cycles N optima c defined during the previous phase. For that last
training cycle, the same training parameters (initial value and variation law
of the gradient step) are used. If ε a denotes the average error computed on
the initial base, and δ the average value of the bias, the generalization error
is estimated by
ε g = ε a + δ.
More generally, the distribution function of the generalization error is esti-
mated by the empirical distribution function of the shifted bias of the value
ε a . Note the contribution of the bootstrap associated with early stopping in
relation to cross-validation,
to some extent, the automation of the design of the network by adapting
the number of early stopping cycles,
a wider estimate of the variability of the model with respect to the data
set,
estimates the confidence intervals (margins, uncertainty),
the use of all examples to construct the network.
Finally, it should be noted that NeMo may monitor the suitability of the
model to the data: if the optimized number of cycles is too close to user-chosen
maximum number of cycles, there is no minimum test error. In that case, the
user must increase the complexity of the network (number of hidden neurons)
or increase the number of training cycles.
3.6.5 Testing the NeMo Method
In the following, we describe the results of an experiment designed to validate
the method. The test consists in comparing the average error estimated by
NeMo to the actual error. The actual error is approximated according to the
Monte Carlo method, i.e., by making a very large number of computations of
the average quadratic error, then by computing its average. We used NeMo
for the approximation of two nonlinear analytical functions,
8
φ 8 ( x )
R
,
R
12
φ 12 ( x )
R
.
R
We chose those functions in order to evaluate the method on the approx-
imation of su ciently complex functions (large dimensions of input space).
 
Search WWH ::




Custom Search