Geoscience Reference
In-Depth Information
[r,p] = corrcoef(x,y)
r =
1.0000 0.9403
0.9403 1.0000
p =
1.0000 0.0000
0.0000 1.0000
In our example the
p
-value is close to zero suggesting that the correlation
coei cient is signii cant. We conclude from this experiment that this
particular signii cance test fails to detect correlations attributed to an outlier.
We therefore try an alternative
t
-test statistic to determine the signii cance
of the correlation between
x
and
y
. According to this test, we can reject the
null hypothesis that there is no correlation if the calculated
t
is larger than
the critical
t
(
n
-2 degrees of freedom,
ʱ
=0.05).
tcalc = r(2,1) * ((length(x)-2)/(1-r(2,1)^2))^0.5
tcrit = tinv(0.95,length(x)-2)
tcalc =
14.8746
tcrit =
1.6991
h is result indeed indicates that we can reject the null hypothesis and therefore
there is no correlation. As an alternative to detecting outliers,
resampling
schemes
or
surrogates
such as the
bootstrap
or
jackknife
methods represent
powerful tools for assessing the statistical signii cance of the results. h ese
techniques are particularly useful when scanning large multivariate data sets
for outliers (see Chapter 9). Resampling schemes repeatedly resample the
original data set of
n
data points, either by choosing
n
-1 subsamples
n
times
(the jackknife), or by picking an arbitrary set of subsamples with
n
data
points
with replacement
(the bootstrap). h e statistics of these subsamples
provide better information on the characteristics of the population than the
statistical parameters (mean, standard deviation, correlation coei cients)
computed from the full data set. h e function
bootstrp
allows resampling of
our bivariate data set, including the outlier
(x,y)=(20,20)
.
rng(0)
rhos1000 = bootstrp(1000,'corrcoef',x,y);
h is command i rst resamples the data a thousand times; it then calculates
the correlation coei cient for each new subsample and stores the result in
the variable
rhos1000
. Since
corrcoef
delivers a 2-by-2 matrix (as mentioned