Applications - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

Example 6.2. This is a continuation of Example 6.1 where several MLPs were

trained and tested 40 times, using R 2 EE and also MSE. The number of

neurons in the hidden layer, n h , was varied from 3 to 6. Error rate assessment

was made with the holdout method: two runs with half the dataset used for

training and the other half for testing, swapping the roles in the second run.

The average test errors in the 40 experiments are shown in Table 6.1 where

one can see that the classification test errors are always smaller with R 2 EE

than with MSE, being the difference highly significant 1 for n h =5 , 6.

Tabl e 6 . 1 Mean test error (%) and standard deviations (in parentheses) for Ex-

ample 6.2 with Student- t test probability ( p ). Statistically significant lower errors

in bold.

n h

R 2 EE

MSE

p

3

2.43(1.33)

2.93(1.46)

0.057

4

2.20(1.20)

2.55(1.24)

0.102

5

2.01(1.09)

2.64(1.13)

0.007

6

2.09(1.02)

2.91(1.73)

0.006

In the following sub-sections we present three procedures that have in view

to attain a near optimal performance and at the same time to speed up

the learning process. This last issue is of particular relevance given that the

algorithms using EE risk functionals are of higher complexity than those

using classic risk functionals.

6.1.1.1

The Smoothing Parameter

We have seen in Chap. 3 how the choice of smoothing parameter (kernel

bandwith), h , influences the convergence towards the MEE solution. The

smoothing parameter is also recognized in [198] as one of the most important

factors influencing the final results of a classification problem when using

MLPs trained with MEE.

As seen in Appendix E, the choice of h for optimal density estimation by

the Parzen window method depends on the number of available samples and

the PDF one wishes to estimate, as well as on the kernel function which we

always assume to be Gaussian. Since what one needs is not an optimal PDF

estimation but its fat estimation as described in Sect. 3.1.3, some latitude on

the choice of h is consented; one just requires a not too large value of h above

the optimum value, for the reasons pointed out in preceding chapters (see,

namely, what we said in Example 3.2 and the last paragraph of Sect. 4.1.3.1).

1 Statistical significance is set at the usual 5% throughout this chapter.

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home