Information Technology Reference
In-Depth Information
Discussion
In my experience, people often fail to check a correlation for significance. In fact, many
people are unaware that a correlation can be insignificant. They jam their data into a
computer, calculate the correlation, and blindly believe the result. However, they
should ask themselves: Was there enough data? Is the magnitude of the correlation
large enough? Fortunately, the cor.test function answers those questions.
Suppose we have two vectors, x and y , with values from normal populations. We might
be very pleased that their correlation is greater than 0.83:
> cor(x, y)
[1] 0.8352458
But that is naïve. If we run cor.test , it reports a relatively large p -value of 0.1648:
> cor.test(x, y)
Pearson's product-moment correlation
data: x and y
t = 2.1481, df = 2, p-value = 0.1648
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.6379590 0.9964437
sample estimates:
cor
0.8352458
The p -value is above the conventional threshold of 0.05, so we conclude that the cor-
relation is unlikely to be significant.
You can also check the correlation by using the confidence interval. In this example,
the confidence interval is (−0.638, 0.996). The interval contains zero, so it is possible
that the correlation is zero, in which case there would be no correlation. Again, you
could not be confident that the reported correlation is significant.
The cor.test output also includes the point estimate reported by cor (at the bottom,
labeled “sample estimates”), saving you the additional step of running cor .
By default, cor.test calculates the Pearson correlation, which assumes that the under-
lying populations are normally distributed. The Spearman method makes no such as-
sumption because it is nonparametric. Use method="Spearman" when working with
nonnormal data.
See Also
See Recipe 1.8 for computing correlations and other basic statistics.
 
Search WWH ::




Custom Search