Information Technology Reference
In-Depth Information
measures for three kinds of centring: (i) weighted by the group sizes, (ii) ignoring group
sizes and (iii) ignoring group sizes but passing through the weighted centroids of the
group means. Perhaps because the sample sizes are not disparate, there is little to choose
between the methods and, of course, with three groups they all give the same two-
dimensional fit, merely distributing the group variation slightly differently between the
first two dimensions - see Tables 4.5 and 4.7. The adequacies of the variables are given
in Table 4.6. As we have explained, these are generally of less interest than predictivities.
We see again that the adequacies do not depend greatly on the centring used and, slightly
surprisingly, once we use two dimensions the type of centring has no effect whatsoever.
This pattern persists in the within-group predictivity (Tables 4.9 and 4.10).
The measures given in the tables are meant to help researchers to detect which
variables and samples are well or not well approximated, so that appropriate action can
be taken. Normally, we would expect them to be displayed on the biplot maps. Though
the R functions are available, this would be futile with the present example as the two-
dimensional diagrams would be cluttered by useless 100% values, so it has not been done.
Elsewhere in the topic, this kind of information has been displayed (see, for example,
Figure 4.1).
Finally, we show how to add new variables to a CVA biplot. We can almost use the
exact procedure that allowed us to construct the PCA biplot of the full Ocotea data set in
Figure 3.24 for adding the newly constructed variables VLDratio and RHWratio .Instead
of a call to PCAbipl , the call is made to CVAbipl . The only changes needed are omitting
the argument scaled.mat , adding G = indmat(Ocotea.data[,2]) and providing
for plotting symbols and colours. The CVA biplot with the two newly added variables is
shown in Figure 4.13. Note that we have also interactively shifted the origin. The reader
can verify by calling CVA.predictivities that the axis predictivity for VLDratio is
0.0728 and that of RHWratio is 0.2326 in one dimension. In the two-dimensional biplot
shown, both attain axis predictivity of unity.
4.9 CVA biplots for two classes
With only two classes, the points representing the two canonical means are necessarily
collinear. The predictive biplot axes are then superimposed onto this single dimension
and hence cannot be distinguished; a solution is to present them separately, as is shown
in the example in Section 4.9.1. We may consider whether the dimensions orthogonal to
the collinear means may be exploited. A potentially important use is for two-dimensional
representations, where we have a spare unused dimension. A problem is that the scaling
L WL
I implies the homogeneity of the dispersion of the canonical variables orthog-
onal to the space containing the means of the canonical variables. Then no preferential
direction exists. We are investigating the possibility of finding a useful criterion that may
be optimized in the orthogonal space to give points that may then be combined with the
dimension holding the means.
=
4.9.1 An example of two-class CVA biplots
Gender remuneration inequalities at universities have been studied in various parts of
the world (see, for example, Barbezat and Hughes, 2005; McNabb and Wass, 1997;
Ward, 2001; Warman et al. , 2006). Although researchers were able to attribute part of the
Search WWH ::




Custom Search