Geoscience Reference
In-Depth Information
the number of estimated parameters (including location, scale and shape
parameters) for the distribution plus 1. For example, for a 3-parameter Weibull
distribution, c = 4. Therefore, the hypothesis that the data are from a population
with the specified distribution is rejected, if
2
2 (,
2 (,
D F is
the critical test-statistic value with k - c degrees of freedom and a significance
level of D.
As mentioned above, the 'chi-square test' is sensitive to the choice of
bins. There is no optimal choice for the bin width because the optimal bin
width depends on the distribution (Snedecor and Cochran, 1980). It should be
noted that the 'chi-square test' is an alternative to the Anderson-Darling and
Kolmogorov-Smirnov tests. The 'chi-square test' can be applied to discrete
distributions such as binomial and Poisson distributions, but the application of
Kolmogorov-Smirnov and Anderson-Darling tests are restricted to continuous
distributions only. For the chi-square approximation to be valid, the expected
frequency should be at least 5 (Snedecor and Cochran, 1980). Generally, this
test is not valid for small samples, and if some of the counts are less than 5,
it may be necessary to combine some bins in the tails.
F!F
) ,
where
D
kc
kc
)
3.2.2 Kolmogorov-Smirnov Test
The Kolmogorov-Smirnov (K-S) test is an empirical distribution function
(EDF) test in which the theoretical cumulative distribution function of the test
distribution is compared with the EDF of the time series data (Conover, 1980;
Armitage and Colton, 1998a). The K-S test was first proposed by Kolmogorov
and then modified by Smirnov. This test finds difference between cumulative
distribution of the time series data and the expected cumulative normal
distribution, and computes its P -value for the largest discrepancy. The test-
statistic is defined as (Massey Jr., 1967):
D = supx| Fn ( x ) - F ( x , μ, s )|
(3)
where F ( x , P, s ) is theoretical cumulative distribution function of the normal
distribution function and Fn ( x ) is the empirical distribution function of the
data.
Large values of D indicate presence of non-normality in the time series.
The table of critical values D D ( n ) of the distribution of D for various sample
sizes ( n ) and significance levels (D) is given in Massey Jr. (1967). If the
population parameters (i.e., P and s ) are known, the original K-S test can be
used. However, if they are not known, they can be replaced by sample estimates
(Massey Jr., 1967; Conover, 1980).
It is worth to mention that the K-S test is strongly criticized by the
researchers due to ambiguous results (Steinskog et al., 2007). Particularly,
conclusions based on the results of not rejecting normality could be very
misleading. D'Agostino (1986) emphasized that the K-S test should not be
applied if population parameters have to be estimated (a usual case).
Search WWH ::




Custom Search