Applications - Minimum Error Entropy Classification - page 151

Information Technology Reference

In-Depth Information

A more comprehensive comparison study was reported in [220], involving

six risk functionals — MSE, CE, EXP, ZED, SEE, R 2 EE — and 35 real-world

datasets. The MLPs had the same architecture and their classification tasks

were performed according to the same protocol. Twenty repetitions of the

classification experiments using stratified 10-fold cross-validation were carried

out for datasets with more than 50 instances per class; otherwise, 2-fold cross-

validation was used. Pooled means of training set and test set errors and of

their balanced counterparts — P e =( P ed + P et ) / 2, P b =( P bd + P bt ) / 2 —

were computed, as well as the pooled standard deviations — sP e =( sP ed / 2+

sP et / 2) 1 / 2 , sP b =( sP bd / 2+ sP bt / 2) 1 / 2 .

The generalization ability was assessed in the same way as in Sect. 3.2.2,

using D e =

P bd for the balanced error counterpart.

Large tables of performance statistics and of multiple sign tests are pro-

vided in [220]. The statistical tests showed that the ubiquitous MSE was the

less interesting risk functional to be used by MLPs: MSE never achieved a

significantly better classification performance than competing risks. CE and

EXP were the risks found by the several tests (Friedman, multiple sign, chi-

square goodness-of-fit for counts of wins and losses, Wilcoxon paired rank-

sum) to be significantly better than their competitors. Counts of significantly

better and worse risks have also evidenced the usefulness of SEE and R 2 EE

for some datasets. It was namely found, in this study, that for some datasets

SEE and R 2 EE reached a significantly higher performance than any other

risk functional; even though performance-wise they positioned between MSE

and {CE, EXP}, they were "irreplaceable" for some datasets. This was not

evidenced by other risk functionals: the highest performing risk had a com-

parable competitor (no statistically significant difference).

In what regards the generalization issue, it was found that all risks except

R 2 EE behaved similarly. R 2 EE exhibited significantly poor generalization, as

shown in the Dunn-Sidak [56] diagram for D e scores of Fig. 6.11.

P et −

P ed ,and D b =

P bt −

MSE

CE

EXP

ZED

SEE

R 2 EE

2

2.5

3

3.5

4

4.5

5

5.5

6

Fig. 6.11

Dunn-Sidak comparison intervals for the D e scores.

An interesting issue regarding MLPs is their comparison with SVMs, a

type of classifier characterized by optimal generalization ability, given the

inherent constraint on the norm of the weight vector. Collobert and Bengio

[43] elucidated the links between SVMs and MLPs; they showed that under

Next Page

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home