Information Technology Reference
In-Depth Information
4 Canonical variate
analysis biplots
In contrast to PCA, canonical variate analysis (CVA) focuses on observations grouped into
K classes. We shall be interested in both between- and within-class variation, particularly
as to how these may be exhibited in graphical form. The main tool of CVA is to transform
the observed variables into what are termed canonical variables that have the property
that the squared distances between the means of the groups are given by Mahalanobis's
D 2 , defined formally in Section 4.2. Mahalanobis distance is monotonically related to the
probability of misclassification when assigning a sample to one of two groups each with
a multinormal distribution with the same covariance matrix but with different means.
This probability of misclassification is given by
1
2 π
−∞
e 1 / 2 y 2 dy
D
/
2
(see Rao, 1952). This result establishes a close relationship with discriminant analysis
which, as we shall see, extends to biplot applications. However, the Mahalanobis distances
are of interest in their own right and we shall approximate them in graphical displays
with associated axes calibrated to give the values of the original variables.
Having transformed to canonical variables, the table of the group means may be
treated for biplot purposes in a similar way to the PCA analysis of X , described in
Chapter 3. Unlike PCA, CVA is invariant to the measurement scales used, thus avoiding
scaling problems.
The algebra may be developed in several ways, but first we establish the context by
discussing a simple example.
4.1 An example: revisiting the Ocotea data
Stinkwood ( Ocotea bullata ) is a large tree, indigenous to South Africa, belonging to the
family Lauraceae . When the wood is freshly cut it has an unpleasant smell from which
Search WWH ::




Custom Search