Biology Reference
In-Depth Information
which is clearly linear in both fitted parameters, m and b.
A function such as:
me
x
1
ε
Y
5
(9.2)
is still linear in the fitted parameter m, even if not in the independent variable X. Because
it is linear in m, this expression is an example of a General Linear Model. However, an
expression such as:
me
α
X
Y
1
ε
(9.3)
5
is not within the family of General Linear Models because it is not linear in the fitted
parameter
α
.
Working with General Linear Models allows us to use a wide range of models, but we
are most likely to be interested in simple extensions of the linear regression model that we
discussed in the last chapter. From that starting point, we can build more complex models
by incorporating both continuous and additional categorical independent variables. For
example, our response variable, Y, might depend on categorical factors A and B plus a
continuous variable X, as well as on the unknown sources of the error term
ε
. This depen-
dence is written as:
Y
i
5
A
i
1
B
i
1
A
i
3
B
i
1
β
x
ð
A
;
B
;
A
B
Þ
X
i
1
ε
i
(9.4)
3
where Y
i
is the dependent variable (either univariate or multivariate) for the ith specimen.
Y
i
can be either a simple number (a scalar), such as centroid size, or a vector of K real
values for multivariate data, such as shape. A and B are categorical variables, which are
usually termed “factors” and X is a continuous variable, which is often termed a “covari-
ate”. “A
i
3
B
i
“ is known as a “crossed” or “interaction” term, meaning that factor A's
impact on Y
i
depends on Y
i
's value on factor B. In this model, the slope term,
β
x
, is a func-
tion of A, B and the interaction term A
B. In the univariate case, the fitted terms A
i
, B
i
,
3
A
i
3
B) are scalars whereas in the multivariate case, they are vectors
having as many coefficients as there are variables, and so is the error term
B
i
and
β
x
(A, B, A
3
ε
i
. The covari-
ates X
i
and Z, however, are univariate in each case.
Just as we saw in our discussion of ANOVA in the last chapter, the factors are cate-
gorical variables such as sex, diet class or species, and the covariates are continuously-
valued variables such as size, position along a geographic transect or fitness, etc.
Throughout this chapter we will assume that the continuous variables (Y and X) are cen-
tered, i.e. their mean values are zero. When these variables are not centered, the model
would include an explicit term for the mean value (and the data would have an addi-
tional degree of freedom that is lost in the process of centering). In the case of shape
data, Y is typically shape, expressed in terms of the difference in shape between each
specimen and the mean. Each factor has two or more levels (i.e. distinct values of the
factor) corresponding to the number of groups. For example, in the case of sexual dimor-
phism that we discussed in the last chapter, the factor is sex, which has two levels:
“male” and “female”. Many factors have more than two levels; habitat, for example, has
(among other possibilities): “montane”, “mid-elevation”, “desert”, “lacustrine”, “river-
ine”, “marine,” etc.