The Recipes - Getting Started with R

Information Technology Reference

In-Depth Information

Discussion

In my experience, people often fail to check a correlation for significance. In fact, many

people are unaware that a correlation can be insignificant. They jam their data into a

computer, calculate the correlation, and blindly believe the result. However, they

should ask themselves: Was there enough data? Is the magnitude of the correlation

large enough? Fortunately, the cor.test function answers those questions.

Suppose we have two vectors, x and y , with values from normal populations. We might

be very pleased that their correlation is greater than 0.83:

> cor(x, y)

[1] 0.8352458

But that is naïve. If we run cor.test , it reports a relatively large p -value of 0.1648:

> cor.test(x, y)

Pearson's product-moment correlation

data: x and y

t = 2.1481, df = 2, p-value = 0.1648

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

-0.6379590 0.9964437

sample estimates:

cor

0.8352458

The p -value is above the conventional threshold of 0.05, so we conclude that the cor-

relation is unlikely to be significant.

You can also check the correlation by using the confidence interval. In this example,

the confidence interval is (−0.638, 0.996). The interval contains zero, so it is possible

that the correlation is zero, in which case there would be no correlation. Again, you

could not be confident that the reported correlation is significant.

The cor.test output also includes the point estimate reported by cor (at the bottom,

labeled “sample estimates”), saving you the additional step of running cor .

By default, cor.test calculates the Pearson correlation, which assumes that the under-

lying populations are normally distributed. The Spearman method makes no such as-

sumption because it is nonparametric. Use method="Spearman" when working with

nonnormal data.