Geoscience Reference
In-Depth Information
Parametric distributions have three significant advantages:
(1) they are amenable to mathematical calculations, (2) the
PDF and CDF are analytically known for all z values, and
(3) they are defined with a few parameters. The primary dis-
advantage of parametric distributions is that, in general, real
data do not conveniently fit a parametric model. However,
data transformation permits data following any distribution
to be transformed to any other distribution, thus capitalizing
on most of the benefits of parametric distributions.
Most data distributions are often not well represented by
a parametric distribution model. Sometimes distributions
are characterized as non-parametric, that is, all of the data
are used to define the distribution with experimental pro-
portions; a parametric model for the CDF or PDF is not re-
quired. In this case, the CDF probability distribution may be
inferred directly from the data, and therefore non-parametric
distributions are more flexible. The CDF is inferred directly
as the proportion of data less than or equal to the threshold
value z . Thus, a proportion is associated to a probability.
A non-parametric cumulative distribution function is a
series of step functions. Some form of interpolation may
be used to provide a more continuous distribution F (  z ) that
extends to arbitrary minimum z min and maximum z max val-
ues. Linear interpolation is often used. More complex inter-
polation models could be considered for highly skewed data
distributions with limited data.
Fig. 2.6 An example of a Q-Q plot. The data is total copper, corre-
sponding to two different lithologies
compare assay results from two different laboratories. A good
way to do this is with a plot of matching quantiles, that is, a
quantile-quantile (Q-Q) plot (Fig. 2.6 ). To generate a Q-Q
plot, we must first choose a series of probability values pk,
k = 1, 2, …, K; then, we plot q 1 (p k ) versus q 2 (p k ), k = 1, 2, …, K.
If all the points fall along the 45° line, the two distribution
are exactly the same; if the line is shifted from the 45°, but
parallel to it, the two distribution have the same shape but
different means; if the slope of the line is not 45°, the two
distributions have different variances, but similar shapes;
and if there is a nonlinear character to the relationship be-
tween the two distributions, they have different histogram
shapes and parameters.
The P-P plot considers matching probabilities for a series
of fixed Z values. The P-P plot will vary between 0 and 1 (or
0 and 100 %), from minimum to maximum values in both
distributions. In practice, Q-Q plots are more useful because
they plot the values of interest (grades, thicknesses, perme-
abilities, etc.), and it is therefore easier to conclude how the
two distributions compare based on sample values.
2.2.3
Quantiles
Quantiles are specific Z values that have a probabilistic
meaning. The p-quantile of the distribution F (  z ) is the value
z p for which: F ( z p ) = Prob {
p . The 99 quantiles
with probability values from 0.01 to 0.99 in increments of
0.01 are known as percentiles. The nine quantiles at 0.1, 0.2,
…, 0.9 are called deciles. The 3 quantiles with probability
values of 0.25, 0.5 and 0.75 are known as quartiles. The 0.5
quantile is also known as the median. The cumulative distri-
bution function provides the tool for extracting any quantile
of interest. The mathematical inverse of the CDF function is
known as the quantile function:
Z
z p
}=
2.2.4
Expected Values
1 ()
z
=
F
p
=
qp
()
The expected value of a random variable is the probability
weighted average of that random variable:
The interquartile Range (IR or IQR) is the difference between
the upper and the lower quartiles: IR = q(0.75) − q(0.25) and 
is used as a robust measure of the spread of a distribution.
The s kewness sign is the sign of the difference between the
mean and the median (m-M) that indicates positive skewness
or negative skewness.
Quantiles are used for comparing distributions in various
ways. They can be used to compare the original data distri-
bution to simulated values, compare two types of samples, or
+∞
+∞
E Z
{ }
== =
m
zdF z
()
zf
()
z dz
−∞
−∞
The expected value of a random variable is also known as
the mean or the first moment. The expected value can also
be considered as a statistical operator. It is a linear operator.
Search WWH ::




Custom Search