Biology Reference
In-Depth Information
parental stature to be the independent variable (X) and the grown child's
stature to be the dependent variable (Y).
6.4
Y
Y
=
aX
+
b
Plotting average parental stature versus child's stature gives the
graph in Figure 4-7, suggesting a possible linear relationship
Y
r
10
6.2
6
¼
þ
b between the variables X and Y. The line in the figure is
the line that ''best fits'' the data set. This line is called the
(least-squares) regression line. In Chapter 5, we shall examine a
criterion for best fit and how the coefficients a and b for this
line can be determined from the data.
aX
5.8
r
3
r
2
r
5
5.6
5.4
r
1
5.2
5
5
5.2
5.4
5.6
5.8
6
6.2
6.4
X
FIGURE 4-7.
Scatter plot of the parent-child data with a plot
of the least-squares regression line and residuals.
Denote the vertical distances of the data points from the line
Y
¼
aX
þ
b by r
1
;
r
2
;
...
;
r
n
. These numbers, calculated as
r
i
¼j
n, give the variation in the Y variable
from the straight line relationship (see Figure 4-7). The sum of squared
residuals (SSR) measure, defined as:
Y
i
ð
aX
i
þ
b
Þj;
i
¼
1
;
2
;
...
;
X
n
r
1
þ
r
2
þ
...
þ
r
n
¼
r
i
SSR
¼
;
(4-4)
¼
i
1
is the most frequently used measure to express the combined variance of
the data from the regression line.
The regression line in the figure represents the mathematical model
that explains the variance in the data caused by genetic factors. The
value of SSR, on the other hand, represents the variance caused by other
factors.
A second sum of squares, often called the total sum of squares (TSS),
can be used to assess the total variation among the observed Y valu
e
s.
It is calculated as the sum of the squared residuals around the mean Y of
the Y values (see Figure 4-8):
X
n
2
2
2
2
TSS
¼ð
Y
1
Y
Þ
þð
Y
2
Y
Þ
þ
...
þð
Y
n
Y
Þ
¼
1
ð
Y
i
Y
Þ
:
i
¼
It can be shown (and is somewhat obvious from the graphs)
that for any set of points SSR
TSS, and the equality is only
possible when the regression line is horizontal; that is, when
the regression line is Y
SSR gives
the variance explained by the model, in this case, the
regression line.
¼
Y. The difference TSS
The coefficients of the least squares regression line, together with
the quantities SSR and TSS, can be obtained as part of the regression
output from all standard statistical software. Here is the MINITAB
output: