Geoscience Reference
In-Depth Information
graph of these pairs approximately forms a straight line, the data are
probably normally distributed. Otherwise, the data distribution may
not be normal.
3.2 Statistical Methods
Decision making about the normality of time series data based on graphical
methods alone is subjective. For extremely non-normal data, it is easy to
make such decision. However, such a decision is not straightforward in many
cases. Therefore, statistical methods are usually necessary to test the assumption
of normality. The statistical methods commonly used for checking normality
of the time series are described in subsequent sections. Of the total eleven
statistical tests discussed ahead, the Kolmogorov-Smirnov, Anderson-Darling
and Cramér-von Mises tests for normality are based on the empirical distribution
function (EDF) and are often referred to as EDF tests (Stephens, 1986).
3.2.1 Chi-Square Test
The 'chi-square test' is used to test if a sample of data came from a population
with a specific distribution (Snedecor and Cochran, 1980). An attractive feature
of the chi-square goodness-of-fit test is that it can be applied to any univariate
distribution for which you can calculate the cumulative distribution function.
The chi-square goodness-of-fit test is applied to the binned data (i.e., data put
into classes). This is actually not a restriction because for the non-binned data,
a histogram or frequency table can be calculated before using the chi-square
test. However, the value of the chi-square test statistic is dependent on how
the data is binned (Snedecor and Cochran, 1980). Another disadvantage of
this test is that it requires a sufficient sample size so that the chi-square
approximation is valid.
For using the 'chi-square test', the time series data are divided into k bins
and the test-statistic is defined as follows (Snedecor and Cochran, 1980):
k
Ç
2
(
OEE
)
F 2 =
(1)
i
i
i
i
1
where O i = observed frequency for the bin i and E i = expected frequency for
the bin i . The expected frequency is calculated as
E i = N { F ( Y U ) - F ( Y L )}
(2)
where F = cumulative distribution function for the distribution being tested,
Y U = upper limit for class i , Y L = lower limit for class i and N = size of the
sample.
The test-statistic approximately follows a chi-square distribution with
( k - c ) degrees of freedom, where k is the number of non-empty cells and c is
Search WWH ::




Custom Search