Graphics Programs Reference
In-Depth Information
ans =
1.0000 0.4641
0.4641 1.0000
After increasing the absolute (x,y) values of this outlier, the correlation
coeffi cient increases dramatically.
x(31,1) = 10; y(31,1) = 10;
plot(x,y,'o'), axis([-1 20 -1 20]);
corrcoef(x,y)
ans =
1.0000 0.7636
0.7636 1.0000
and reaches a value close to r =1 if the outlier has a value of
(x,y)=(20,20) .
x(31,1) = 20; y(31,1) = 20;
plot(x,y,'o'), axis([-1 20 -1 20]);
corrcoef(x,y)
ans =
1.0000 0.9275
0.9275 1.0000
Still, the bivariate data set does not provide much evidence for a strong
dependence. However, the combination of the random bivariate (x,y) data
with one single outlier results in a dramatic increase of the correlation coef-
fi cient. Whereas outliers are easy to identify in a bivariate scatter, erroneous
values might be overlooked in large multivariate data sets.
Various methods exist to calculate the signifi cance of Pearson·s correla-
tion coeffi cient. The function corrcoef provides the possibility for evalu-
ating the quality of the result. Furthermore, resampling schemes or surro-
gates such as the bootstrap or jackknife method provide an alternative way
of assessing the statistical signifi cance of the results. These methods repeat-
edly resample the original data set with N data points either by choosing N -1
subsamples N times (the jackknife) or picking an arbitrary set of subsamples
with N data points with replacements (the bootstrap). The statistics of these
subsamples provide a better information on the characteristics of the popu-
lation than statistical parameters (mean, standard deviation, correlation co-
effi cients) computed from the full data set. The function bootstrp allows
resampling of our bivariate data set including the outlier (x,y)=(20,20) .
Search WWH ::




Custom Search