Graphics Reference
In-Depth Information
Table 2.3 Test of heteroscedasticity of levene (based on means)
Cleveland
Glass
Iris
Pima
Wine
Wisconsin
(0.000) a
(0.000) a
(0.000) a
(0.003) a
(0.000) a
(0.000) a
a
indicates that homocedasticity is not satisfied
The third condition needing to be fulfilled is heteroscedasticity. ApplyingLevene's
test to the samples of the six data sets results in Table 2.3 .
Clearly, in both cases, the non fulfillment of the normality and homoscedasticity
conditions is perfectible. In most functions, the normality condition is not verified
in a single-problem analysis. The homoscedasticity is also dependent of the number
of algorithms studied, because it checks the relationship of the variances of all pop-
ulation samples. Even though in this case we only analyze this condition in results
for two algorithms, the condition is also not fulfilled in many other cases.
Obtaining results in a single data set analysis when using stochastics ML algo-
rithms is a relatively easy task, due to the fact that new results can be yielded in new
runs of the algorithms. In spite of this fact, a sample of 50 results that should be
large enough to fulfill the parametric conditions does not always verify the necessary
precepts for applying parametric tests, as we could see in the previous section.
On the other hand, other ML approaches are not stochastic and it is not possible to
obtain a larger sample of results. This makes the comparison between stochastic ML
methods and deterministic ML algorithms difficult, given that the sample of results
might not be large enough or it might be necessary to use procedures which can
operate with samples of different size.
For all these reasons, the use of non-parametric test for comparing ML algorithms
is recommended [ 5 ].
2.2.3 Non-parametric Tests for Comparing Two Algorithms
in Multiple Data Set Analysis
The authors are usually familiarized with parametric tests for pairwise comparisons.
ML approaches have been compared through parametric tests by means of paired t
tests.
In some cases, the t test is accompanied with the non-parametric Wilcoxon test
applied over multiple data sets. The use of these types of tests is correct when we
are interested in finding the differences between two methods, but they must not be
used when we are interested in comparisons that include several methods. In the
case of repeating pairwise comparisons, there is an associated error that grows as the
number of comparisons done increases, called the family-wise error rate (FWER),
defined as the probability of at least one error in the family of hypotheses. To solve
this problem, some authors use the Bonferroni correction for applying paired t test
in their works [ 27 ] although is not recommended.
 
 
Search WWH ::




Custom Search