Information Technology Reference
In-Depth Information
A more comprehensive comparison study was reported in [220], involving
six risk functionals — MSE, CE, EXP, ZED, SEE, R 2 EE — and 35 real-world
datasets. The MLPs had the same architecture and their classification tasks
were performed according to the same protocol. Twenty repetitions of the
classification experiments using stratified 10-fold cross-validation were carried
out for datasets with more than 50 instances per class; otherwise, 2-fold cross-
validation was used. Pooled means of training set and test set errors and of
their balanced counterparts — P e =( P ed + P et ) / 2, P b =( P bd + P bt ) / 2 —
were computed, as well as the pooled standard deviations — sP e =( sP ed / 2+
sP et / 2) 1 / 2 , sP b =( sP bd / 2+ sP bt / 2) 1 / 2 .
The generalization ability was assessed in the same way as in Sect. 3.2.2,
using D e =
P bd for the balanced error counterpart.
Large tables of performance statistics and of multiple sign tests are pro-
vided in [220]. The statistical tests showed that the ubiquitous MSE was the
less interesting risk functional to be used by MLPs: MSE never achieved a
significantly better classification performance than competing risks. CE and
EXP were the risks found by the several tests (Friedman, multiple sign, chi-
square goodness-of-fit for counts of wins and losses, Wilcoxon paired rank-
sum) to be significantly better than their competitors. Counts of significantly
better and worse risks have also evidenced the usefulness of SEE and R 2 EE
for some datasets. It was namely found, in this study, that for some datasets
SEE and R 2 EE reached a significantly higher performance than any other
risk functional; even though performance-wise they positioned between MSE
and {CE, EXP}, they were "irreplaceable" for some datasets. This was not
evidenced by other risk functionals: the highest performing risk had a com-
parable competitor (no statistically significant difference).
In what regards the generalization issue, it was found that all risks except
R 2 EE behaved similarly. R 2 EE exhibited significantly poor generalization, as
shown in the Dunn-Sidak [56] diagram for D e scores of Fig. 6.11.
P et
P ed ,and D b =
P bt
MSE
CE
EXP
ZED
SEE
R 2 EE
2
2.5
3
3.5
4
4.5
5
5.5
6
Fig. 6.11
Dunn-Sidak comparison intervals for the D e scores.
An interesting issue regarding MLPs is their comparison with SVMs, a
type of classifier characterized by optimal generalization ability, given the
inherent constraint on the norm of the weight vector. Collobert and Bengio
[43] elucidated the links between SVMs and MLPs; they showed that under
Search WWH ::




Custom Search