Biology Reference
In-Depth Information
2 distribution with a parameter equal to the degrees of freedom.
Therefore, the ratio (regression mean square error)/(residual mean
square error), which is a quotient of two
have a
w
2 distributions, would have an
F-distribution. As the regression line has 1 degree of freedom, the
w
w
2 distribution corresponding to the regression mean square error has
parameter 1. In our example, the value of the regression mean square
error is 0.53646. The
2 distribution that corresponds to the residual
mean square error has 8 degrees of freedom (because we have 10 data
points and two parameters of the regression line), so the residual mean
square error is 0.28579/8
w
0.03572. Finally, this implies the quotient
(regression mean square error)/(residual mean square error) has
F-distribution with (1,8) degrees of freedom.
¼
The value of the quotient (regression mean square error)/(residual mean
square error) for our example is F
15.02. Following
the procedure illustrated with Figure 4-4(C), what is the probability of
obtaining such a value if H 0 were true? The answer is found using
software that computes the F-distribution or from F-distribution tables,
taking into account the degrees of freedom we determined. The p-value,
corresponding to the F value of 15.02 with degrees of freedom (1,8) is
p
¼
0.53646/0.03572
¼
0.005. As this is less than the standard confidence level of 0.05, we
can conclude that the null hypothesis should be rejected. That is, we
cannot assume zero heritability, showing the contribution of the genetic
factor in this example is significant.
¼
The F-test answers our second question—how to decide whether the
contribution of an underlying genetic factor is significant, relative to
environmental factors. In our example, we obtained an affirmative
answer to this question under the assumption of a linear relationship
between the factors.
We have now discussed two common types of hypothesis—whether
there is difference between two means and whether there is a difference
between two variances. In analyzing the question of a difference in the
means, we shall usually analyze the data using a t-test, because the
variances are usually unknown. When comparing variances, the
appropriate statistical test is an F-test, because the underling sampling
distribution is approximately an F-distribution. In any case, all
calculations are carried out by statistical software. What is important is
to know how to choose the appropriate statistical test and how to
interpret the software output.
In closing, we can now answer the third question we asked at the
beginning of the chapter; namely, what is the common mathematical
thread that links all statistical tests comparing means, evaluating the
contribution of various factors, or testing the linear dependence of an
outcome on a set of predictive variables? The common mathematical
background of all of the tests we considered is the underlying
normal distribution of the data, the common paradigm of formulating
Search WWH ::




Custom Search