Biology Reference
In-Depth Information
We can use the model presented in Equation 9.4 to ask a series of questions such as: (1)
Does shape (Y) depend on either or both of the factors A and B? (2) Is there an interaction
between the factors? (3) Does shape depend on a covariate X? (4) Does the slope of Y on X
depend on one or both of the factors? (5) Does the slope depend on the interaction
between the two factors? As well as asking such “yes” or “no” questions, we can also ask
what fraction of the total variation in shape is explained by each term, factor and covari-
ate. The set of techniques collectively used to answer such question is known as General
Linear Models (GLM).
GLM include a variety of models, such as the linear regression and two group compari-
sons of the last chapter, plus techniques such as analysis of variance (ANOVA), analysis
of covariance (ANCOVA) and their multivariate equivalents, multivariate analysis of vari-
ance (MANOVA), multivariate analysis of covariance (MANCOVA), multiple regression,
and a range of other models (that do not all have names). The model described above,
which has two factors, A and B and a covariate X, would be called a two-factor ANCOVA
(or MANCOVA). Like the family of ANOVA methods, GLM methods use the distribution
of sums of squares (or mean squares), which are proportional to the variance contributed
by each factor (and covariate) plus the error terms. When the data are univariate, the sums
of squares are scalars but when the data are multivariate, they are matrices “sums of
squares and cross products” (SSCP) matrices. Whether univariate or multivariate, sums of
squares are used to form F-ratios for hypothesis testing and to estimate (and decompose)
the variation explained. Whether the data are univariate or multivariate, the error term (
)
represents the residual unexplained variance or “noise”. Different methods are required
when using SSCP matrices for hypothesis testing, but the fundamental concepts are the
same in both the univariate and multivariate case.
This chapter will discuss GLM, beginning with applications to univariate data to lay the
foundation, then extending them to multivariate data. We consider both classical statistical
models and those tested by permuting the data. We begin with a general overview of fac-
tors and experimental design because that design has a major impact on the efficacy of
these statistical methods. One of the most important considerations is the number of fac-
tors to be tested because the number of interaction terms grows with the number of fac-
tors. For example,
ε
in a model containing three factors (A, B and C) there are four
interaction terms, A
C. Adding a fourth factor increases the
number of interaction terms to 11 and so on. The rapid growth of interaction terms, and
the attendant increase in the number of parameters that must be estimated, makes the
effective use of complex models a daunting task even though they are powerful.
B, A
C, B
C and A
B
3
3
3
3
3
FACTORS AND EXPERIMENTAL DESIGN
The nature of the factors in a particular experiment, and the manner in which data
are collected will both have a major impact on how the data can be analyzed as well as
the types of questions that can be addressed effectively. We first explain the distinction
between fixed and random factors, then the distinction between crossed and nested factors
and then the distinction between main effects and interactions. After that, we focus on
the distinction between balanced and unbalanced designs because the procedures for
Search WWH ::




Custom Search