Graphics Reference
In-Depth Information
In this section our interest lies in presenting a methodology for analyzing the
results offered by a pair algorithms in a certain study, by using non-parametric tests
in a multiple data set analysis. Furthermore, we want to comment on the possibility of
comparison with other deterministic ML algorithms. Non-parametric tests could be
applied to a small sample of data and their effectiveness have been proved in complex
experiments. They are preferable to an adjustment of data with transformations or to
a discarding of certain extreme observations (outliers) [ 16 ].
This section is devoted to describing a non-parametric statistical procedure for per-
forming pairwise comparisons between two algorithms, also known as the Wilcoxon
signed-rank test, Sect. 2.2.3.1 ; and to show the operation of this test in the presented
case study, Sect. 2.2.3.2 .
2.2.3.1 Wilcoxon Signed-Ranks Test
This is the analogue of the paired t-test in non-parametric statistical procedures;
therefore, it is a pairwise test that aims to detect significant differences between
two sample means, that is, the behavior of two algorithms. Let d i be the difference
between the performance scores of the two classifiers on i th out of N ds data sets. The
differences are ranked according to their absolute values; average ranks are assigned
in case of ties. Let R + be the sumof ranks for the data sets on which the first algorithm
outperformed the second, and R the sum of ranks for the opposite. Ranks of d i
0
are evenly split among the sums; if there is an odd number of them, one is ignored:
=
2
d i
1
R + =
rank
(
d i ) +
rank
(
d i )
d i
>
0
=
0
2
d i =
1
R + =
(
d i ) +
(
d i )
rank
rank
d i <
0
0
R + ,
R )
.If T is less than or equal
to the value of the distribution of Wilcoxon for N ds degrees of freedom ([ 32 ], Table
B.12), the null hypothesis of equality of means is rejected.
Wilcoxon signed ranks test is more sensible than the t-test. It assumes commen-
surability of differences, but only qualitatively: greater differences still count more,
which is probably desired, but the absolute magnitudes are ignored. From the sta-
tistical point of view, the test is safer since it does not assume normal distributions.
Also, the outliers (exceptionally good/bad performances on a few data sets) have
less effect on the Wilcoxon than on the t test. The Wilcoxon test assumes continuous
differences d i , therefore they should not be rounded to one or two decimals, since
this would decrease the power of the test due to a high number of ties.
Please note when the assumptions of the paired t test are met, Wilcoxon signed-
ranks test is less powerful than the paired t test. On the other hand, when the assump-
tions are violated, the Wilcoxon test can be even more powerful than the t test. This
Let T be the smaller of the sums, T
=
min
(
 
Search WWH ::




Custom Search