Probability and Statistics - Geomathematics: Theoretical Foundations, Applications and Future Developments

Geoscience Reference

In-Depth Information

observed frequencies is equal to n . It can be shown (Cram ´ r 1947 ) that for this

reason one degree of freedom must be subtracted from the number of classes m .

More degrees of freedom are lost if, in order to obtain the theoretical frequencies

( f e ), use is made of parameters estimated from the observations; in general, the

number of degrees of freedom is to be reduced further by the number of parameters

that were estimated. Consequently, the chi-square test of normality has ( m

3)

2

degrees of freedom. For the example of Table 2.3 with

9.1, the number of

degrees of freedom is 4. From statistical tables, it can be found for

ˇ

¼

ʱ ¼

0.05 that

2

ˇ

9.49. Hence the normality hypothesis can be accepted. However, it

should be kept in mind that

0 : 95 (4)

¼

2

ˇ

0 : 941 (4)

¼

9.1. This means that a normal distribution

2 equal to or larger than 9.1 in only 5.9 % of events if this particular

experiment were to be repeated a large number of times for the same theoretical

distribution.

The preceding chi-square test for goodness of fit is well-known. It was originally

proposed by Karl Pearson and refined by Ronald Fisher who exactly determined the

number of degrees of freedom to be used. A similar test that is at least as good as

the chi-square test is the G 2 -test (see, e.g., Bishop et al. 1975 ). Finally, the

Kolmogorov-Smirnov test should be mentioned. It consists of determining the

largest (positive or negative) difference between theoretical and observed frequen-

cies. In the two-tailed Kolmogorov-Smirnov test, the absolute value of the largest

difference should not exceed 1.36/ n 0.5 with a probability of 95 % provided that the

number of observations exceeds 40. The corresponding confidence for

would yield a

ˇ

the

one-tailed test is 1.22/ n 0.5 .

2.4.2 Q-Q Plots: Normal Distribution Example

Normality can also be tested graphically by means of a so-called Q - Q plot for

comparing observed quantiles with theoretical quantiles. When the theoretical

frequency distribution is normal, this is the same as using normal probability

paper. In Fig. 2.6 , the scale along the vertical axis is linear but the horizontal

scale has been changed in such a manner that the S-shaped curve for any theoretical

cumulative normal distribution plots as a straight line. A normal distribution always

becomes a straight line on normal probability paper. Figure 2.6 shows three types of

plot for the 76 biotite ages listed in Table 2.3 : (1) original data (points); (2) theo-

retical normal curve (straight line); (3) a 95 % confidence belt on the theoretical

normal curve. These three plots have been constructed as follows:

Firstly, cumulative frequencies were determined for the classes of ages shown

in Table 2.2 . These were converted into cumulative frequency percentage values.

If upper class limits are used, it is not possible to plot the value for the 1,200-

1,220 Ma class because the last class has cumulative frequency of 100 % that is not

part of the probability scale. One may omit plotting this last value but a slight

Geomathematics: Theoretical Foundations, Applications and Future Developments

Search WWH ::

Custom Search

Home