Biology Reference
In-Depth Information
where x
i
5
X
i
2,
X
.
(the difference between an observed value of X
i
and its expected
value
,
X
.
, which is the sample mean) and y
i
5
Y
i
2,
Y
.
(the difference between an
observed value of Y
i
and its expected value
), x
i
and y
i
are called centered versions of
the original variables. Thus, we are summing residuals, or deviations from expected
values, over all N individuals in a population. By minimizing this function, we will obtain
the best estimates for m and b.
To find the values of m and b that minimize the sum of squared residuals, we set the
derivative to zero (for both m and b). As you recall from calculus, the derivative of a func-
tion is zero at the maximum and minimum. We then solve for m and b. Using this optimi-
zation method, the equation for the slope, m, can be written as:
,
Y
.
P
xy
P
x
2
m
(8.3)
5
which is the sum of the products of the deviations divided by the sum of the squared
deviations of the X values (each sum is taken over all individuals). In other words, the
slope is the ratio of the deviations of Y to the corresponding deviations of X. When the
corresponding deviations are identical, the slope is one; when the deviations of Y are a
consistent multiple of the deviations of X, the slope will be that multiple.
Substituting the X
i
2,
for y
i
allows us to compute m directly
from the observed values. The sum of the products can be written as:
X
xy
X
for x
i
and Y
i
2,
Y
.
.
X
ð
X
i
X
.
Þð
Y
i
Y
.
Þ
(8.4)
5
2,
2,
which can be simplified to:
N
X
X
i
Y
i
2
X
X
i
X
Y
i
(8.5)
After applying a similar substitution and simplification to the sum of the squared devia-
tions, we can write:
N
P
i
5
1
X
i
Y
i
2
P
i
5
1
X
i
P
i
5
1
Y
i
m
5
(8.6)
2
N
P
i
5
1
X
i
2
P
i
5
1
X
i
Now that we have an expression for the slope, we can solve for the intercept, b, and
complete the equation for the regression. When b
0,
Y
m
X
, so we can calculate
5
,
.5
,
.
b from the observed values, X
i
and Y
i
, and the sample size, N:
P
i
5
1
Y
i
2
m
P
i
5
1
X
i
N
b
Y
m
X
(8.7)
5,
.2
,
.5
In addition to an estimate of the value of m, we will also need measures of the uncer-
tainty of that estimate. These measures will be used to test whether m is significantly dif-
ferent from zero (because if we cannot say that, we cannot claim that Y depends on X),
and to test whether the value of m differs between samples (whether the relationship
between X and Y is different).