Graphics Reference
In-Depth Information
In this section our interest lies in presenting a methodology for analyzing the
results offered by a pair algorithms in a certain study, by using non-parametric tests
in a multiple data set analysis. Furthermore, we want to comment on the possibility of
comparison with other deterministic ML algorithms. Non-parametric tests could be
applied to a small sample of data and their effectiveness have been proved in complex
experiments. They are preferable to an adjustment of data with transformations or to
a discarding of certain extreme observations (outliers) [
16
].
This section is devoted to describing a non-parametric statistical procedure for per-
forming pairwise comparisons between two algorithms, also known as the Wilcoxon
signed-rank test, Sect.
2.2.3.1
; and to show the operation of this test in the presented
case study, Sect.
2.2.3.2
.
2.2.3.1 Wilcoxon Signed-Ranks Test
This is the analogue of the paired t-test in non-parametric statistical procedures;
therefore, it is a pairwise test that aims to detect significant differences between
two sample means, that is, the behavior of two algorithms. Let
d
i
be the difference
between the performance scores of the two classifiers on
i
th out of
N
ds
data sets. The
differences are ranked according to their absolute values; average ranks are assigned
in case of ties. Let
R
+
be the sumof ranks for the data sets on which the first algorithm
outperformed the second, and
R
−
the sum of ranks for the opposite. Ranks of
d
i
0
are evenly split among the sums; if there is an odd number of them, one is ignored:
=
2
d
i
1
R
+
=
rank
(
d
i
)
+
rank
(
d
i
)
d
i
>
0
=
0
2
d
i
=
1
R
+
=
(
d
i
)
+
(
d
i
)
rank
rank
d
i
<
0
0
R
+
,
R
−
)
.If
T
is less than or equal
to the value of the distribution of Wilcoxon for
N
ds
degrees of freedom ([
32
], Table
B.12), the null hypothesis of equality of means is rejected.
Wilcoxon signed ranks test is more sensible than the t-test. It assumes commen-
surability of differences, but only qualitatively: greater differences still count more,
which is probably desired, but the absolute magnitudes are ignored. From the sta-
tistical point of view, the test is safer since it does not assume normal distributions.
Also, the outliers (exceptionally good/bad performances on a few data sets) have
less effect on the Wilcoxon than on the
t
test. The Wilcoxon test assumes continuous
differences
d
i
, therefore they should not be rounded to one or two decimals, since
this would decrease the power of the test due to a high number of ties.
Please note when the assumptions of the paired
t
test are met, Wilcoxon signed-
ranks test is less powerful than the paired
t
test. On the other hand, when the assump-
tions are violated, the Wilcoxon test can be even more powerful than the
t
test. This
Let
T
be the smaller of the sums,
T
=
min
(