Agriculture Reference
In-Depth Information
Variation between data sets
Two basic procedures are frequently used in quantita-
tive genetics to interpret the variation and relationship
that exists between characters, or between one character
evaluated in different environments. These are simple
linear regression and correlation.
A straight line regression can be adequately described
by two estimates, the slope or
gradient
of the line (
b
)
and the
intercept
on the
y
-axis (
a
). These are related by
the equation:
The comparable equations for SS(
x
)are:
n
2
(
)
=
1
(
x
i
−¯
)
SS
x
x
2
i
=
n
n
n
x
i
)
−
(
)
=
1
(
1
(
x
i
)
SS
x
=
=
i
i
Notice that a sum of squares is really a special case of
a sum of products. You should also note that if every
y
value is exactly equal to every
x
value, then the equation
used to estimate
b
becomes, SS(
x
)/SS(
x
1.
Having determined
b
, the intercept value is found
by substituting the mean values of
x
and
y
into the
rearranged equation.
)
=
=
+
y
bx
a
It can be seen that
b
is the gradient of the line, because
a change of one unit on the
x
-axis results in a change of
b
units on the
y
-axis. If
x
and
y
both increase (or both
decrease) together, the gradient is positive. If, however,
x
increases while
y
decreases or vice versa, then the gra-
dient is negative. When
x
=¯
−
¯
x
In regression analysis it is always assumed that one
character is the dependant variable and the other is
independent. For example, it is common to com-
pare parental performance with progeny performance
(see Chapter 6) and in this case then progeny perfor-
mance would be considered the dependant variable and
parental performance independent. The performance
of progeny is obviously dependent on the performance
of their parents, and not
vice versa
.
The degree of association between any two, or a num-
ber of different characters can be examined statistically
by the use of
correlation analysis
. Correlation analysis
is similar in many ways to simple regression but in cor-
relations there is no need to assign one set of values to
be the
dependant variable
while the other is said to be
the
independent variable
. Correlation coefficients (
r
a
y
b
=
0, the equation for
y
reduces to:
y
=
a
and
a
is therefore the point at which the regression line
crosses the
y
-axis. This intercept value may be equal to,
greater than or less than zero.
The formulation and theory behind regression anal-
ysis will not be described here and are not within
the scope of this topic. However, the gradient of the
best fitting straight line (also known as the
regression
coefficient
) for a collection of points whose coordinates
are
x
and
y
is estimated as:
b
=[
SP
(
x
,
y
)/
SS
(
x
)
]
)
are calculated from the equation:
where, SP(
x
,
y
)isthe
sum of products
of the deviations
of
x
and
y
from their respective means (
(
)
SP
x
,
y
¯
x
and
y
) and
¯
=
√
[
r
)
]
where SP(
x
,
y
) is again the sum of products between the
two variables, SS(
x
) is the sum of squares of one variable
(
x
) and SS(
y
) is the sum of squares of the second variable
(
y
), and:
(
)
×
(
(
)
SS
x
SS
y
SS
is the
sum of the squared deviations
of
x
from its
mean. It will be useful to have an understanding of the
regression analysis and to remember the basic regression
equations.
Now, SP(
x
,
y
) is given by the equation:
x
n
(
n
SP
(
x
,
y
)
=
1
(
x
i
−¯
x
)(
y
i
−¯
y
)
n
−
1
)
SP
(
x
,
y
)
=
1
(
x
i
−¯
x
)(
y
i
−¯
y
)
=
=
i
i
n
2
although in practice it is usually easier to calculate it
using the equation:
SS
(
x
)
=
1
(
x
i
−¯
x
)
(
n
−
1
)
=
i
n
n
n
2
n
n
SP
(
x
,
y
)
=
1
(
x
i
y
i
)
−
1
(
x
i
)
1
(
y
i
)
SS
(
y
)
=
1
(
y
i
−¯
y
)
(
n
−
1
)
=
=
=
=
i
i
i
i