Graphics Reference
In-Depth Information
the “no free lunch” theorem) than to work with partial knowledge about the problem,
knowledge that allows us to design algorithms with specific characteristics which
can make them more suitable to solve of the problem.
2.2.1 Conditions for the Safe Use of Parametric Tests
In [ 24 ] the distinction between parametric and non-parametric tests is based on the
level of measure represented by the data to be analyzed. That is, a parametric test
usually uses data composed by real values.
However the latter does not imply that when we always dispose of this type of
data, we should use a parametric test. Other initial assumptions for a safe usage of
parametric tests must be fulfilled. The non fulfillment of these conditions might cause
a statistical analysis to lose credibility.
The following conditions are needed in order to safely carry out parametric tests
[ 24 , 32 ]:
Independence : In statistics, two events are independent when the fact that one
occurs does not modify the probability of the other one occurring.
Normality : An observation is normal when its behaviour follows a normal or
Gauss distribution with a certain value of average
. A normality
test applied over a sample can indicate the presence or absence of this condition
in observed data. Three normality tests are usually used in order to check whether
normality is present or not:
μ
and variance
σ
- Kolmogorov-Smirnov : compares the accumulated distribution of observed data
with the accumulated distribution expected from a Gaussian distribution, obtain-
ing the p -value based on both discrepancies.
- Shapiro-Wilk : analyzes the observed data to compute the level of symmetry and
kurtosis (shape of the curve) in order to compute the difference with respect to
a Gaussian distribution afterwards, obtaining the p -value from the sum of the
squares of the discrepancies.
- D'Agostino-Pearson : first computes the skewness and kurtosis to quantify how
far from Gaussian the distribution is in terms of asymmetry and shape. It then
calculates how far each of these values differs from the value expected with
a Gaussian distribution, and computes a single p-value from the sum of the
discrepancies.
Heteroscedasticity : This property indicates the existence of a violation of the
hypothesis of equality of variances. Levene's test is used for checking whether or
not k samples present this homogeneity of variances (homoscedasticity). When
observed data does not fulfill the normality condition, this test's result is more
reliable than Bartlett's test [ 32 ], which checks the same property.
With respect to the independence condition, Demsar suggests in [ 5 ] that indepen-
dency is not truly verified in k -FCV and 5
×
2CV (a portion of samples is used either
 
Search WWH ::




Custom Search