Information Technology Reference
In-Depth Information
After the PCA of the canonical means XL ,weplotinasmanydimensionsaswe
require the PCA approximation XLVJ , with the group means XLVJ . It is interesting to
note that we might have proceeded immediately to the PCA of the canonical variables
XL , which would require the eigenvectors of
( XL ) ( XL ) = L TL = L ( B + W ) L = V V + I = V ( + I ) V .
Thus, we obtain the same eigenvectors as (4.4) with C = N but the between-group
eigenvalues are increased by unity, indicating the inclusion of the between-group variation
in the m dominant dimensions. Of course, working with the group means achieves the
same ends by using a smaller calculation, but this result emphasizes that we are essentially
concerned with a simple PCA of the canonical variables.
We note that the two-sided equation also arises from finding the linear combination
x β
of the variables that maximizes the between-class to the within-class variance ratio
β ( X CX ) β
β W β
.
(4.7)
This is a useful property but only gives a one-dimensional solution for β , though the
above two-step approach fully justifies the retention of the remaining eigenvectors, albeit
without the variance ratio justification, unless one accepts the often quoted property that
the r th vector maximizes the ratio conditional on being orthogonal (in the metric W )
to the previous r 1 vectors. In our approach, this property is satisfied globally as a
natural consequence of the least-squares property of the SVD that, in turn, generates the
two-sided eigenvector equation. The one-dimensional solution, given by the first eigen-
vector Lv 1 of the two-sided eigenvalue equation, is prominent in the statistical literature
as the linear discriminant function (LDF), especially when K
1.
It offers a linear combination of the variables with optimal discriminatory properties (i.e.
maximizing the variance ratio (4.7)). We have seen that Mahalanobis distance discrim-
inates by using all dimensions and that CVA, with its classification regions, supports
discrimination using one or more dimensions to approximate Mahalanobis distance. The
LDF is merely the one-dimensional version.
Finally, we discuss the matrix C . In PCA we require the data to be expressed in
deviations from its column means, thus ensuring the requirement that the best-fitting
plane contains the centroid. In CVA we have the choice of whether to work in deviations
from the group means, weighted or not by the group sizes. In the unweighted case,
C = I - K 1 11 ; in the weighted case, C = N . The choice makes no difference to
Mahalanobis distance but ensures that the best-fitting plane goes through either the
unweighted centroid of the group means or their weighted centroid. Lower-dimensional
approximations are affected, as might be expected. In the unweighted form, all group
means are treated equally, irrespective of sample sizes. In the weighted form the
groups with the most samples will be better represented than those with few samples.
Circumstances will dictate which is the better. If one wants a general appreciation of
the Mahalanobis distances between the groups then the unweighted form seems better.
If one were classifying a new sample, as with the stinkwood example, then the weighted
form would seem to be the better. However, with the stinkwood example, it makes no
difference which form is used, as both give an exact representation of the three group
=
2 and therefore m
=
Search WWH ::




Custom Search