Biology Reference
In-Depth Information
In this expression, P is the number of variables, W
5
N
2
B
2
1 (where N is the total
number of individuals) and B
1 (where G is the number of groups). The degrees of
freedom are determined by the product of P and B .
The testing procedure begins by computing the estimated
G
5
2
2
is the product
of the eigenvalues of all CVs. If this value is significantly greater than expected for the
given degrees of freedom, it is safe to infer there are statistically significant differences
among the groups. In the squirrel jaw example, there are three groups and 26 shape vari-
ables, and the maximum possible number of meaningful CVs is two. Bartlett's test on both
CVs yields a
χ
in which
Λ
2 of 206.6, with 52 degrees of freedom, for a p -value less than 0.000001. This
result indicates that at least some of the groups in the study can be discriminated using
scores on these two CVs.
We do not yet know whether both CVs contribute to discrimination of the groups, so
the next step is to remove the eigenvalue for the first CV (the most efficient discriminator)
and repeat the test. Reducing the number of CVs reduces the number of groups that can
be discriminated, which reduces B by 1 and the degrees of freedom by P . These changes
produce a
χ
2 of 83.5 with 26 degrees of freedom for a p -value that is still less than
0.000001. Thus, the second CV also contributes to discriminating among the groups.
In general, the test is reiterated using the remaining R (
χ
0
(all eigenvalues have been removed) or some set of R remaining eigenvectors fails the test.
If R goes to zero, the analysis will have shown that some groups can be discriminated on
the CV that is the least efficient discriminator. If a set of R eigenvectors fails the test, then
only the first B
B
i ) eigenvectors until R
5
2
5
R CVs contribute to discriminating among the groups. Note that the test
cannot be taken to indicate that all groups can be discriminated, and it does not indicate
which groups can be discriminated.
One simple approach to assessing the utility of the CVs for discriminating among
groups can also be evaluated using the Mahalanobis distances of specimens from the
group mean. The means are computed using the a priori group assignments. The
Mahalanobis distance between a specimen X and the mean M of a group, is given by:
2
q
ðX 2
T S 2 1
D 5
ðX 5
(6.44)
where S 2 1 is the inverse of the variance
covariance matrix of the CV scores of the speci-
mens. The predicted group membership of each specimen based on the scores is deter-
mined by assigning each specimen to the group whose mean is closest (under the
Mahalanobis distance) to the specimen. All of the CVs that pass Bartlett's test, and only
those CVs, are used to compute the Mahalanobis distances and assign specimens to
groups.
When specimens are assigned to a group using CV axes estimated using the same data
set, the resulting rate of correct specimen assignment to groups are referred to as a resub-
stitution rate of assignment. Resubstitution rates involve a certain degree of circularity, in
that the same data was used to create the discriminant functions (CV axes) and to assess
the performance of those functions. This process leads to some level of over-fitting of the
model to the data, and an overestimate of the effectiveness of the CVA. A number of
approaches have been developed to produce more reasonable estimates of the actual error
rate for a classification method ( Knoke, 1986; Schiavo and Hand, 2000 ). The actual error rate
Search WWH ::




Custom Search