Biology Reference
In-Depth Information
the population levels for each. We can then calculate the correlation using any of a
number of statistical methods; one such method is now presented.
5.6.1 Correlating Data Sets
Suppose we obtain two sets of data and wish to know how closely related they are.
The absolute numbers may not tell the whole story, since the patterns in the data
may remain similar even if the magnitude of the values change. Here we describe
how to calculate Pearson's sample correlation coefficient , a real number r
∈[−
1
,
1
]
,
which estimates how closely correlated two data sets are. At r
1, there is perfect
correlation, such as between the ordered data sets {1, 2, 3} and {2, 4, 6}. At r
=
1,
there is perfect negative correlation, meaning that as the data increase in one set,
they decrease by precisely the same proportional amount in the second data set (this
value is obtained for data sets {1, 2, 3} and {
=−
0,
there is no connection between the two data sets at all (this value is obtained for data
sets {1, 2, 3} and {1, 2, 1}). Naturally, the larger the data sets the more informative
the correlation coefficient. We now describe how to calculate r in general and then
provide an example.
Definition 5.3 (Pearson's sample correlation coefficient).
8,
9,
10}, for example). At r
=
Let x
,
y be data sets
consisting of n points (labeled sequentially as x 1 ,...,
x n and y 1 ,...,
y n ), and let
x
¯
and
y be the mean value of x and y respectively. Then Pearson's sample correlation
coefficient r is defined as
¯
i = 1 (
n
x i −¯
x
)(
y i −¯
y
)
r
=
n
2 .
i = 1 (
i = 1 (
n
2
x i −¯
x
)
y i −¯
y
)
Example 5.2.
Let x
={
150
,
30
,
40
,
54
,
72
}
and y
={
72
,
18
,
10
,
30
,
40
}
. Then
x
¯
=
69
.
2 and
¯
y
=
34. Then
i = 1 ( x i x )( y i y )
5
r =
5
i = 1 ( x i x )
i = 1 ( y i y )
5
2
2
(
80
.
8
·
38
) + (
39
.
2
·−
16
) + (
29
.
2
·−
24
) + (
15
.
2
·−
4
) + (
2
.
8
·
6
)
=
( 80 . 8 2
2
2
2
+ 2 . 8 2
)( 38 2
2
2
2
+ 6 2
+ ( 39 . 2 )
+ ( 29 . 2 )
+ ( 15 . 2 )
+ ( 16 )
+ ( 24 )
+ ( 4 )
)
4476
9156 . 8 · 2328
=
.
.
0
969
Thus we see that these two sets of data are in fact very closely positively correlated,
in keeping with the observation that the values in y are approximately one half of the
corresponding values in x .
Search WWH ::




Custom Search