Biology Reference
In-Depth Information
show one approach to calculating variances based on sums of squares, using the estimated
means of the specimens in each level of factor A. This approach is conceptually easy to
understand, but it is not in the matrix notation we will need to use in the next chapter,
and it is not the approach used in most computer-based calculations. Most modern
approaches to calculating the sums of squares use matrix algebra and the differences
between the sums of squares are explained by models expressed in terms of design matri-
ces. The simple summation methods presented below are easier to understand at an
introductory level, but difficult to scale up to larger problems and are probably more
prone to rounding errors. Researchers interested in programming their own GLM methods
will need to consult more advanced texts to develop a complete understanding of these
approaches ( Anderson, 2001a, b, 2006; Anderson and Robinson, 2001; Rencher and
Schaalje, 2008 , for starters).
Suppose that we have a univariate dependent variable Y, which depends on a factor A,
which has J distinct levels and n j specimens per level. We will not require that there be
equal numbers of specimens in each level (also called a cell) at this point. However, we
will require that Y be centered, i.e. its mean value is zero, thereby removing one degree of
freedom. For the ith (i
1ton j ) specimen in cell j (j
1toJ) we have the model
5
5
Y ij
5 α
1 ε
(8.20)
j
ij
where
α j is the contribution of the jth level of the factor to the value, and
ε ij is the error.
Notice that we require that the mean value of the residual terms
ε ij be zero, with variance
σ e 2 , and also that the mean value of Y be zero (because Y is centered). Consequently, the
n j α j terms summed over all the cells must also equal zero,
X
J
n j α j 5
0
(8.21)
j
1
5
1) of them are independent because
the constraint that t he y sum to zero removes one degree of freedom. Some authors include
a mean value of Y, Y or
As a result, there are J values for
α j but only (J
2
μ
in the expression, rather than requiring that Y be centered, so
you may see the form
Y ij 5 μ 1 α j 1 ε ij
(8.22)
We can now look at variance partitioning, i.e. splitting the variation into the portion
explained by the factor and that left unexplained, which is typically called the residual or
error term, just as in linear regression. We need to do this variance partitioning to under-
stand how to form F-ratios in the context of a one-way ANOV A. We will do this by first
looking at the summed square values around the me an value (Y), then splitting that into
two terms, the first being the scatter about the mean (Y j ) of each level of factor A, and the
other being the scatter of the mean values of each level about the total mean. The total
sum of squares is given by:
n j
X
J
X
X
J
X
X
J
k
n j Y 2
2
2
1 ð Y ij
Y Þ
1 ð Y ij
Y j Þ
SS total
5
2
5
2
1
(8.23)
j
j
1
i
j
1
i
j
1
5
5
5
5
5
Search WWH ::




Custom Search