Geoscience Reference
In-Depth Information
2.6.1.2 Correlation Coefficients
The correlation coefficient measures the strength
of the dependency between two parameters by
comparing how far pairs of values ( x , y ) deviate
from a straight line function, and is given by the
function:
data points, and the distance separating those two
points. Numerically, this is expressed as the aver-
aged squared differences between the pairs of data
in the data set, given by the empirical variogram
function, which is most simply expressed as:
2
2 ʳ ¼
ð
1 = N
Þʣ z i
ð
:
Þ
z j
2
4
n
N
n ¼1
¼
1
=
N
ð
x i μ x
Þ
y i μ y
ˁ ¼
ð
2
:
3
Þ
˃ x ˃ y
where z i and z j are pairs of points in the dataset.
For
convenience we generally use
the
semivariogram function:
where
N
number of points in the data set
x i , y i ¼
¼
2
values of point in the two data sets
ʳ ¼
ð
1
=
2N
Þʣ
z i
z j
ð
2
:
5
Þ
μ x ,
μ y ¼
mean values of the two data sets, and
˃ y ¼
standard deviations of the two data sets
(the square of the variance)
If the outcome of the above function is posi-
tive then higher values of x tend to occur with
higher values of y , and the data sets are said to be
'positively correlated'. If the outcome is
˃ x ,
The semivariogram function can be calculated
for all pairs of points in a data set, whether or not
they are regularly spaced, and can therefore be used
to describe the relationship between data points
from, for example, irregularly scattered wells.
The results of variogram calculations can be
represented graphically (e.g. Fig. 2.22 ) to estab-
lish the relationship between the separation dis-
tance (known as the lag) and the average
1
then the relationship between x and y is a simple
straight line. A negative outcome means high
values of one data set correlate with low values
of the other: 'negative correlation'. A zero result
indicates no correlation.
Note that correlation coefficients assume the
data sets are both linear. For example, two data
sets which have a log-linear relationship might
have a very strong correlation but still display a
poor correlation coefficient. Of course, a coeffi-
cient can still be calculated if the log-normal data
set (e.g. permeability) is first converted to a lin-
ear form by taking the logarithm of the data.
Correlation between datasets (e.g. porosity
versus permeability) is typically entered into res-
ervoir modelling packages as a value between
0 and 1, in which values of 0.7 or higher gener-
ally indicate a strong relationship. The value may
be described as the 'dependency'.
ˁ ¼
value
for pairs of points which are that distance apart.
The data set has to be grouped into distance bins
to do the averaging; hence only one value
appears for any given lag in Fig. 2.22 .
A more formal definition of semi-variance is
given by:
ʳ
n
o
1
2 EZx
2
ʳ
ðÞ ¼
h
½
ð
þ
h
Þ
Zx
ðÞ
ð
2
:
6
Þ
where
E
¼
the expectation (or mean)
Z(x)
the value at a point in space
Z(x + h)
¼
the value at a separation distance,
h (the lag)
Generally,
¼
increases as a function of separa-
tion distance. Where there is some relationship
between the values in a spatial dataset,
ʳ
2.6.1.3 The Variogram
Correlation coefficients reflect the variation of
values within a dataset, but say nothing about
how these values vary spatially. For reservoir
modelling we need to express spatial variation
of
shows
smaller values for points which are closer together
in space, and therefore more likely to have similar
values (due to some underlying process such as the
tendency for similar rock types to occur together).
As the separation distance increases the difference
between the paired samples tends to increase.
Fitting a trend line through the points on a
ʳ
parameters,
and
the
central
concept
controlling this is the variogram.
The variogram captures
the relationship   Search WWH ::

Custom Search