Information Technology Reference
In-Depth Information
A more comprehensive comparison study was reported in [220], involving
six risk functionals — MSE, CE, EXP, ZED, SEE, R
2
EE — and 35 real-world
datasets. The MLPs had the same architecture and their classification tasks
were performed according to the same protocol. Twenty repetitions of the
classification experiments using stratified 10-fold cross-validation were carried
out for datasets with more than 50 instances per class; otherwise, 2-fold cross-
validation was used. Pooled means of training set and test set errors and of
their balanced counterparts —
P
e
=(
P
ed
+
P
et
)
/
2,
P
b
=(
P
bd
+
P
bt
)
/
2 —
were computed, as well as the pooled standard deviations —
sP
e
=(
sP
ed
/
2+
sP
et
/
2)
1
/
2
,
sP
b
=(
sP
bd
/
2+
sP
bt
/
2)
1
/
2
.
The generalization ability was assessed in the same way as in Sect. 3.2.2,
using
D
e
=
P
bd
for the balanced error counterpart.
Large tables of performance statistics and of multiple sign tests are pro-
vided in [220]. The statistical tests showed that the ubiquitous MSE was the
less interesting risk functional to be used by MLPs: MSE never achieved a
significantly better classification performance than competing risks. CE and
EXP were the risks found by the several tests (Friedman, multiple sign, chi-
square goodness-of-fit for counts of wins and losses, Wilcoxon paired rank-
sum) to be significantly better than their competitors. Counts of significantly
better and worse risks have also evidenced the usefulness of SEE and R
2
EE
for some datasets. It was namely found, in this study, that for some datasets
SEE and R
2
EE reached a significantly higher performance than any other
risk functional; even though performance-wise they positioned between MSE
and {CE, EXP}, they were "irreplaceable" for some datasets. This was not
evidenced by other risk functionals: the highest performing risk had a com-
parable competitor (no statistically significant difference).
In what regards the generalization issue, it was found that all risks except
R
2
EE behaved similarly. R
2
EE exhibited significantly poor generalization, as
shown in the Dunn-Sidak [56] diagram for
D
e
scores of Fig. 6.11.
P
et
−
P
ed
,and
D
b
=
P
bt
−
MSE
CE
EXP
ZED
SEE
R
2
EE
2
2.5
3
3.5
4
4.5
5
5.5
6
Fig. 6.11
Dunn-Sidak comparison intervals for the
D
e
scores.
An interesting issue regarding MLPs is their comparison with SVMs, a
type of classifier characterized by optimal generalization ability, given the
inherent constraint on the norm of the weight vector. Collobert and Bengio
[43] elucidated the links between SVMs and MLPs; they showed that under