Information Technology Reference
In-Depth Information
One of the assumptions of CVA is that the within-class covariance matrices are equal
among all classes.
On comparing the CVA biplot in Figure 4.2 with the corresponding PCA biplot in
Figure 4.1 we draw attention to the following:
The PCA biplot is scale-dependent; the CVA biplot is scale-invariant.
Axis predictivities and predictivities for the group means in the PCA biplot are not
equal to unity, but in the CVA biplot they all attain the maximum value of unity
(see Section 4.2)
There is a higher degree of separation between the group means in the CVA biplot
than in the PCA biplot.
In the PCA biplot the interpolated group means do not contribute to the scaffolding
axes; in the CVA biplot the scaffolding axes are determined by the group means.
Before discussing the theoretical basis of these differences for a complete under-
standing of CVA biplots, let us study our example in a little more detail. It is clear from
Figure 4.2 that the species are well separated by the CVA, although there is some overlap
between stinkwood ( Obul ) and imbuia ( Opor ). The leave-one-out cross validation error
rate was calculated and the proportion of incorrect classifications found to be 0.081.
Since it is well known that the error rate can sometimes be improved by using only a
subset of the variables (see Flury, 1997), the possibility of using fewer variables was
investigated. In Table 4.2 only the smallest leave-one-out cross validation error rate for
each subset size is given, together with the associated subset of variables.
Table 4.2 shows that a better classification rate is obtained with the optimal subset
of three, four or five variables than with the complete set of six variables. Based on
the principle of parsimony (Occam's razor) the variables FibL , Ve s L and Ve sD were
selected for use in the final analysis. Testing for significant differences between the
class covariance matrices based upon the three variables selected, a p -value of 0.065 is
obtained, leading to the nonrejection of the hypothesis at the 5% significance level.
After selecting these three variables on a statistical basis, it transpired that there is the
added practical advantage that smaller (thin strip) wood samples are needed to make these
measurements compared to the other three variables. Therefore methods based on only
these features can be viewed as nondestructive, which is an important aspect when taking
wood samples for microscopic analysis from precious old Cape furniture. Furthermore,
very small articles can be identified. The CVA biplot based on variables FibL , Ve s L and
Ve sD is given in Figure 4.3.
Ta b l e 4 . 2 Smallest CVA cross validation error rates for different sized
subsets of the variables of the Ocotea data set.
Subset size
Cross validation error rate
Associated variables
1
0.243
FibL
2
0.081
FibL, VesL
3
0.054
FibL, VesL, VesD
4
0.054
FibL, VesL, VesD, RayW
5
0.054
FibL, VesL, VesD, RayW, RayH
6
0.081
complete set
Search WWH ::




Custom Search