Geoscience Reference
In-Depth Information
Fig. 2.7 Probability plot with
outliers identified
The expected value of the squared difference from the
mean is known as the variance ( σ 2 ). It is written:
interquartile range (described above), and the mean absolute
deviation (MAD). These measures are not used extensively.
2
2
Var Z
{ }
=
E
{[
Z
m
] }
=
σ
z
2.2.5
Extreme Values—Outliers
2
2
=
E Z
{
2
Zm
+
m
}
z
z
2
2
=
EZ
{ } 2
mEZ
{}
+
m
A small number of very low or very high values may strong-
ly affect summary statistics like the mean or variance of the
data, the correlation coefficient, and measures of spatial con-
tinuity. If they are proven to be erroneous values, then they
should be removed from the data. For extreme values that are
valid samples, there are different ways to handle them: (1)
classify the extreme values into a separate statistical popula-
tion for special processing, or (2) use robust statistics, which
are less sensitive to extreme values. These options can be
used at different times in mineral resource estimation. As
a general principle, the data should not be modified unless
they are known to be erroneous, although their influence in
spatial predictive models may be restricted.
Many geostatistical methods require a transformation of the
data that reduces the influence of extreme values. Probabil-
ity plots can sometimes be used to help identify and correct
extreme values, see Fig. 2.7 . The values in the upper tail of
the distribution could be moved back in line with the trend
determined from the other data. An alternative consists of cap-
ping whereby values higher than a defined outlier threshold are
reset to the outlier threshold itself. The high values could be in-
z
z
2
2
=
EZ
{}
m
The square root of the variance is the standard deviation ( σ
or s). The standard deviation is in the units of the variable. It
is common to calculate a dimensionless coefficient of varia-
tion (CV), that is, the ratio of the standard deviation divided
by the mean.
CV
=
σ
/
m
As an approximate guide, a CV less than 0.5 indicates a
fairly well behaved set of data. A CV greater than 2.0 or 2.5
indicates a distribution of data with significant variability,
such that some predictive models may not be appropriate.
There are additional measures of central tendency aside
from the mean. They include the median (50 % of the data
smaller and 50 % larger), the mode (the most common obser-
vation), and the geometric mean.There are also measures of
spread aside from the variance. They include the range (dif-
ference between the largest and smallest observation), the
Search WWH ::




Custom Search