Information Technology Reference
In-Depth Information
Example 6.2. This is a continuation of Example 6.1 where several MLPs were
trained and tested 40 times, using R 2 EE and also MSE. The number of
neurons in the hidden layer, n h , was varied from 3 to 6. Error rate assessment
was made with the holdout method: two runs with half the dataset used for
training and the other half for testing, swapping the roles in the second run.
The average test errors in the 40 experiments are shown in Table 6.1 where
one can see that the classification test errors are always smaller with R 2 EE
than with MSE, being the difference highly significant 1 for n h =5 , 6.
Tabl e 6 . 1 Mean test error (%) and standard deviations (in parentheses) for Ex-
ample 6.2 with Student- t test probability ( p ). Statistically significant lower errors
in bold.
n h
R 2 EE
MSE
p
3
2.43(1.33)
2.93(1.46)
0.057
4
2.20(1.20)
2.55(1.24)
0.102
5
2.01(1.09)
2.64(1.13)
0.007
6
2.09(1.02)
2.91(1.73)
0.006
In the following sub-sections we present three procedures that have in view
to attain a near optimal performance and at the same time to speed up
the learning process. This last issue is of particular relevance given that the
algorithms using EE risk functionals are of higher complexity than those
using classic risk functionals.
6.1.1.1
The Smoothing Parameter
We have seen in Chap. 3 how the choice of smoothing parameter (kernel
bandwith), h , influences the convergence towards the MEE solution. The
smoothing parameter is also recognized in [198] as one of the most important
factors influencing the final results of a classification problem when using
MLPs trained with MEE.
As seen in Appendix E, the choice of h for optimal density estimation by
the Parzen window method depends on the number of available samples and
the PDF one wishes to estimate, as well as on the kernel function which we
always assume to be Gaussian. Since what one needs is not an optimal PDF
estimation but its fat estimation as described in Sect. 3.1.3, some latitude on
the choice of h is consented; one just requires a not too large value of h above
the optimum value, for the reasons pointed out in preceding chapters (see,
namely, what we said in Example 3.2 and the last paragraph of Sect. 4.1.3.1).
1 Statistical significance is set at the usual 5% throughout this chapter.
Search WWH ::




Custom Search