Geoscience Reference
In-Depth Information
above),
rhos1000
has the dimensions of 1000-by-4, i.e., 1000 values for each
element of the 2-by-2 matrix. Plotting the histogram of the 1000 values for
the second element, i.e., the correlation coei cient of
(x,y)
, illustrates the
dispersion of this parameter with respect to the presence or absence of the
outlier. Since the distribution of
rhos1000
contains many empty classes, we
use a large number of bins.
histogram(rhos1000(:,2),30)
h e histogram shows a cluster of correlation coei cients at around
r
=0.1
that follow the normal distribution, and a strong peak close to
r
=1 (Fig. 4.3).
h e interpretation of this histogram is relatively straightforward. When the
subsample contains the outlier the correlation coei cient is close to one, but
subsamples without the outlier yield a very low (close to zero) correlation
coei cient suggesting the absence of any strong interdependence between
the two variables
x
and
y
.
Bootstrapping therefore provides a simple but powerful tool for either
accepting or rejecting our i rst estimate of the correlation coei cient for the
population. h e application of the above procedure to the synthetic sediment
data yields a clear unimodal Gaussian distribution for the correlation
coei cients of the subsamples.
Fig. 4.3
Bootstrap result for Pearson's correlation coei cient
r
from 1000 subsamples. h e
histogram shows a roughly normally-distributed cluster of correlation coei cients at around
r
=0, suggesting that these subsamples do not include the outlier. h e strong peak close to
r
=1,
however, suggests that an outlier with high values for the two variables
x
and
y
is present in
the corresponding subsamples.