Agriculture Reference
In-Depth Information
Variation between data sets
Two basic procedures are frequently used in quantita-
tive genetics to interpret the variation and relationship
that exists between characters, or between one character
evaluated in different environments. These are simple
linear regression and correlation.
A straight line regression can be adequately described
by two estimates, the slope or gradient of the line ( b )
and the intercept on the y -axis ( a ). These are related by
the equation:
The comparable equations for SS( x )are:
n
2
(
) =
1 (
x i −¯
)
SS
x
x
2
i
=
n
n
n
x i )
(
) =
1 (
1 (
x i )
SS
x
=
=
i
i
Notice that a sum of squares is really a special case of
a sum of products. You should also note that if every y
value is exactly equal to every x value, then the equation
used to estimate b becomes, SS( x )/SS( x
1.
Having determined b , the intercept value is found
by substituting the mean values of x and y into the
rearranged equation.
) =
=
+
y
bx
a
It can be seen that b is the gradient of the line, because
a change of one unit on the x -axis results in a change of
b units on the y -axis. If x and y both increase (or both
decrease) together, the gradient is positive. If, however,
x increases while y decreases or vice versa, then the gra-
dient is negative. When x
¯
x
In regression analysis it is always assumed that one
character is the dependant variable and the other is
independent. For example, it is common to com-
pare parental performance with progeny performance
(see Chapter 6) and in this case then progeny perfor-
mance would be considered the dependant variable and
parental performance independent. The performance
of progeny is obviously dependent on the performance
of their parents, and not vice versa .
The degree of association between any two, or a num-
ber of different characters can be examined statistically
by the use of correlation analysis . Correlation analysis
is similar in many ways to simple regression but in cor-
relations there is no need to assign one set of values to
be the dependant variable while the other is said to be
the independent variable . Correlation coefficients ( r
a
y
b
=
0, the equation for y
reduces to:
y
=
a
and a is therefore the point at which the regression line
crosses the y -axis. This intercept value may be equal to,
greater than or less than zero.
The formulation and theory behind regression anal-
ysis will not be described here and are not within
the scope of this topic. However, the gradient of the
best fitting straight line (also known as the regression
coefficient ) for a collection of points whose coordinates
are x and y is estimated as:
b
=[
SP
(
x , y
)/
SS
(
x
) ]
)
are calculated from the equation:
where, SP( x , y )isthe sum of products of the deviations
of x and y from their respective means (
(
)
SP
x , y
¯
x and
y ) and
¯
=
[
r
) ]
where SP( x , y ) is again the sum of products between the
two variables, SS( x ) is the sum of squares of one variable
( x ) and SS( y ) is the sum of squares of the second variable
( y ), and:
(
) ×
(
(
)
SS
x
SS
y
SS
is the sum of the squared deviations of x from its
mean. It will be useful to have an understanding of the
regression analysis and to remember the basic regression
equations.
Now, SP( x , y ) is given by the equation:
x
n
(
n
SP
(
x , y
) =
1 (
x i −¯
x
)(
y i −¯
y
)
n
1
)
SP
(
x , y
) =
1 (
x i −¯
x
)(
y i −¯
y
)
=
=
i
i
n
2
although in practice it is usually easier to calculate it
using the equation:
SS
(
x
) =
1 (
x i −¯
x
)
(
n
1
)
=
i
n
n
n
2
n
n
SP
(
x , y
) =
1 (
x i y i )
1 (
x i )
1 (
y i )
SS
(
y
) =
1 (
y i −¯
y
)
(
n
1
)
=
=
=
=
i
i
i
i
 
Search WWH ::




Custom Search