Graphics Reference
In-Depth Information
for training and testing in different partitions). Hold-out partitions can be safely take
as independent, since training and tests partitions do not overlap.
The independence of the events in terms of getting results is usually obvious, given
that they are independent runs of the algorithmwith randomly generated initial seeds.
In the following, we show a normality analysis by using KolmogorovSmirnov's,
ShapiroWilk's and D'AgostinoPearson's tests, together with a heteroscedasticity
analysis by using Levene's test in order to show the reader how to check such property.
2.2.2 Normality Test over the Group of Data Sets
and Algorithms
Let us consider an small case of study, where we take into account an stochastic
algorithm that needs a seed to generate its model. A classic example of these types
of algorithms is the MLP. Using a small set of 6 well-known classification problems,
we aim to analyze whether the conditions required to safely perform a parametric
statistical analysis are held. We have used a 10-FCV validation scheme in which
MLP is run 5 times per fold, thus obtaining 50 results per data set. Please note that
using a k -FCV will mean that independence is not held but it is the most common
validation scheme used in classification so this study case turns out to be relevant.
First of all, we want to check if our samples follow a normal distribution. In
Table 2.2 the p -values obtained for the normality test were described in the previous
section. As we can observe, in many cases the normality assumption is not held
(indicated by an “ a ” in the table).
In addition to this general study, we show the sample distribution in three cases,
with the objective of illustrating representative cases in which the normality tests
obtain different results.
FromFig. 2.4 to 2.6 , different examples of graphical representations of histograms
and Q-Q graphics are shown. A histogram represents a statistical variable by using
bars, so that the area of each bar is proportional to the frequency of the represented
values. A Q-Q graphic represents a confrontation between the quartiles from data
observed and those from the normal distributions.
In Fig. 2.4 we can observe a general case in which the property of abnormality
is clearly presented. On the contrary, Fig. 2.5 is the illustration of a sample whose
distribution follows a normal shape, and the three normality tests employed verified
Table 2.2 Normality test applied to a sample case
Cleveland
Glass
Iris
Pima
Wine
Wisconsin
0.00 a
0.00 a
0.00 a
0.09 a
Kolmogorov-Smirnov
0.09
0.20
0.00 a
0.00 a
0.00 a
0.02 a
Shapiro-Wilk
0.04
0.80
0.01 a
0.02 a
0.00 a
D'Agostino-Pearson
0.08
0.51
0.27
a
indicates that the normality is not satisfied
 
 
Search WWH ::




Custom Search