Biology Reference
In-Depth Information
four major analytic test statistics: Wilk's Lambda, Pillai's trace, Hotelling
Lawley trace and
Roy's largest root, all based on the properties of the eigenvalues of these matrices ( Quinn
and Keogh, 2002 ). Pillai's trace is thought to be the most robust ( Johnson and Field, 1993 ),
but all the tests should converge in the large sample limit. The analytic models used in the
tests based on these statistics assume that the SSCP matrices follow the Wishart distribution,
which means that the matrices are of the form SS
Y 0 Y where Y is a centered matrix (i.e.
the column means are equal to zero) with identically distributed normal distributions of all
elements in Y . The variances need not be equal across all the variables and they can also be
correlated and the tests are thought to be reasonably robust to violations of the assumption
of normality, although tests of normality are available. The methods also assume equality of
the error variance
5
covariance matrices within each factor. Details of these tests may be
found elsewhere ( Mardia et al., 1979; Rencher, 1995; Searle, 1997, 2006; Quinn and Keogh,
2002 ; and references therein) and most software packages that carry out GLM or MANOVA
or MANCOVA will report several, if not all, of these tests.
For the statistical analysis of shape data, the major issue is the need to estimate these
matrices accurately, and to invert the sum of square and cross products matrices. A matrix
cannot be inverted unless it is of full rank, which will not be the case when there are fewer
degrees of freedom in the data than there are measured variables. Even when the degrees of
freedom are relatively close to the number of variables, these tests can yield wildly inaccurate
results when applied to relatively small data sets. One of us (HDS) has observed a substantial
overestimate of the variance explained by factors that are estimated using SSCP matrices at a
sample size roughly two to three times the degrees of freedom in the data set. One rule of
thumb is that we need four times the number of specimens as landmarks ( Bookstein, 1996 ).
When the data contain a large number of semilandmarks, it will be difficult to invert these
SSCP matrices and even if we have only landmarks we will still have more variables than
degrees of freedom. Using semilandmarks exacerbates the problem because semilandmarks,
in two dimensions have two coordinates but only one degree of freedom, and using enough
semilandmarks to get good coverage of the morphology increases the number of specimens
needed to estimates the SSCP. For these reasons, permutation methods based on Procrustes
distances appear to be more useful approaches for shape data.
PERMU TATION APPROACHES TO GENERAL LINEAR M ODELS
McArdle and Anderson (2001) note that the information contained in the sum of
squares and cross products matrix, SS Total 5
Y 0 Y, of any centered matrix, which is also
referred to as the “inner product” matrix, is also contained in the “outer product” matrix
YY 0 , obtained from the matrix of pairwise distances among the n specimens. One approach
to hypothesis testing follows from this, which is to form pseudo F statistics ( McArdle and
Anderson, 2001 ). Remember that a single factor MANOVA based on the model
Y
XB
(9.40)
5
1 ε
has an F-test of the form:
F
5 ½ð
SS H Þ=ð
J
1
Þ=½ð
SS error Þ=ð
n
J
Þ
(9.41)
2
2
Search WWH ::




Custom Search