Statistics - Geometric Morphometrics for Biologists

Biology Reference

In-Depth Information

show one approach to calculating variances based on sums of squares, using the estimated

means of the specimens in each level of factor A. This approach is conceptually easy to

understand, but it is not in the matrix notation we will need to use in the next chapter,

and it is not the approach used in most computer-based calculations. Most modern

approaches to calculating the sums of squares use matrix algebra and the differences

between the sums of squares are explained by models expressed in terms of design matri-

ces. The simple summation methods presented below are easier to understand at an

introductory level, but difficult to scale up to larger problems and are probably more

prone to rounding errors. Researchers interested in programming their own GLM methods

will need to consult more advanced texts to develop a complete understanding of these

approaches ( Anderson, 2001a, b, 2006; Anderson and Robinson, 2001; Rencher and

Schaalje, 2008 , for starters).

Suppose that we have a univariate dependent variable Y, which depends on a factor A,

which has J distinct levels and n j specimens per level. We will not require that there be

equal numbers of specimens in each level (also called a cell) at this point. However, we

will require that Y be centered, i.e. its mean value is zero, thereby removing one degree of

freedom. For the ith (i

1ton j ) specimen in cell j (j

1toJ) we have the model

Y ij

5 α

1 ε

(8.20)

where

α j is the contribution of the jth level of the factor to the value, and

ε ij is the error.

Notice that we require that the mean value of the residual terms

ε ij be zero, with variance

σ e 2 , and also that the mean value of Y be zero (because Y is centered). Consequently, the

n j α j terms summed over all the cells must also equal zero,

n j α j 5

(8.21)

1) of them are independent because

the constraint that t he y sum to zero removes one degree of freedom. Some authors include

a mean value of Y, Y or

As a result, there are J values for

α j but only (J

in the expression, rather than requiring that Y be centered, so

you may see the form

Y ij 5 μ 1 α j 1 ε ij

(8.22)

We can now look at variance partitioning, i.e. splitting the variation into the portion

explained by the factor and that left unexplained, which is typically called the residual or

error term, just as in linear regression. We need to do this variance partitioning to under-

stand how to form F-ratios in the context of a one-way ANOV A. We will do this by first

looking at the summed square values around the me an value (Y), then splitting that into

two terms, the first being the scatter about the mean (Y j ) of each level of factor A, and the

other being the scatter of the mean values of each level about the total mean. The total

sum of squares is given by:

n j

n j Y 2

1 ð Y ij

Y Þ

1 ð Y ij

Y j Þ

SS total

(8.23)

Geometric Morphometrics for Biologists

Search WWH ::

Custom Search

Home