Geoscience Reference
In-Depth Information
Fig. 2 .16 Data transformation
using Cumulative Distribution
Functions
mean and variance of reasonably large subdivisions within
the domain, as in Fig. 2.14 , then a spatial trend model may
be required.
Although the identification of a trend is subjective, it is gen-
erally accepted that the trend is deterministic and should not
have short scale variability. It should be identified from features
that are significantly larger than the data spacing, i. e., domain-
wide. This sometimes can be evident from the experimental
variogram that may show a trend in any one or more direc-
tions. The experimental variogram continues to increase above
the variance of the data as the lag distance increases (Chap. 6;
Journel and Huijbregts 1978 ). This usually indicates that the de-
cision of stationarity should be revisited, and consider whether
the domain should be subdivided or a trend considered.
1
y G
=
=
(
Fz
( ))
1
which is back-transformed by
z
F
(
Gy
(
))
The expected values should not be back transformed unless
the distribution is symmetric.
A variable Z is non-standard Gaussian when the standard-
ized variable Y is standard Gaussian. A non-standard Gauss-
ian value is easily converted to/from a standard Gaussian
value.
zm
y
=
Z
zy
=⋅ +
σ
m
Z
Z
σ
Z
The normal score transform is rank preserving and revers-
ible. The disadvantages of performing such a transform are
that the significance of the numbers themselves is less clear,
more difficult to interpret, and also that the distribution
parameters cannot be back transformed directly due to the
nonlinearity of the process.
Spikes of constant values in the original distribution
can cause problems. Gaussian values are continuous and
ties (equal values) in the original distribution must be re-
solved prior to transforming the data. There are two differ-
ent methods commonly used to break the ties or despike.
The simpler method is to add a small random component
to each tie, which is the most common approach used in
popular software packages, such as the GSLIB programs
(Deutsch and Journel 1997 ). A better alternative is to add
a random component based on local averages of the data
(Verly 1984 ), which ranks the ties based on the local grades
of nearby data. Although more onerous in terms of time and
computer effort, it is justified when the proportion of origi-
nal data with the same values is significant. Typical drill
hole data from Au epithermal deposits can show a signifi-
cant number of values at or below the laboratory's detec-
tion limit, sometimes as much as 50 or 60 %, in which case
despiking is better accomplished using the local averaging
method. Of course, an alternative is to separate the barren or
un-mineralized material into its own stationary population.
2.4
Gaussian Distribution and Data
Transformations
Gaussian distributions are commonly used due to their con-
venient statistical properties. The Gaussian distribution is
derived from the Central Limit Theorem, which is one of the
most consequential theorems in statistics.
A univariate Gaussian distribution is fully characterized
by its mean (  m ) and standard deviation ( σ ). The probability
density function is given by:
2
1
1
2
z
m
g ( z ) =
exp
σ
σ
2 π
It is common to transform data to a Gaussian distribution.
There are many instances where the prediction of uncertainty
at un-sampled locations becomes much easier with a Gauss-
ian distribution.
The simplest method to transform any distribution into a
Gaussian distribution is a direct quantile-to-quantile trans-
formation, whereby the CDF of each distribution is used to
perform the transform. This is known as the Normal Scores
(NS) transform, see Fig. 2.16 . The NS transform is achieved
by quantile transformation:
 
Search WWH ::




Custom Search