Information Technology Reference
In-Depth Information
The biplot in the bottom panel of Figure 5.33 results from the following function
call:
PCAbipl(X = means.mat.scaled, G = indmat(1:3),
pch.samples = rep(15,3), pch.samples.size = 1.25,
pch.new.labels.size = 0.6, pch.new.size = 1,
colours = UBcolours[(1:3)+12], X.new.samples = data,
pch.new = 16, pch.new.labels = 1:37,
pch.new.cols = rep(UBcolours[1:3],c(20,7,15)), exp.factor = 2,
markers = FALSE, pos = "Hor", offset = c(0.02, 0.3, 0.1, 0.1),
offset.m = rep(-0.1, 6), n.int = rep(2,6))
First we compare this representation with that given in Figure 4.1. The top panel
is concerned with unnormalized data which, as we have seen, can have a profound
effect on PCA, as is verified by comparing with the bottom panel showing an
unstructured but normalized PCA. The normalization may be regarded as a first
step towards removing the incommensurabilities that are fully handled by CVA
itself (Figure 4.2). In Figure 5.33 (bottom panel) the three means are represented
exactly in two dimensions, as in CVA, roughly at the vertices of an equilateral
triangle and, as might be expected, with a different orientation than in Figure 4.1
(bottom panel). The between-group dispersion is much clearer in Figure 5.33 and
has much less overlap than in Figure 4.1 (bottom panel). The grouping given by
CVA in Figure 4.2 is even clearer but looks remarkably similar to Figure 5.33. Apart
from Numves , the corresponding biplot axes in Figure 4.2 and the bottom panel of
Figure 5.33 are almost identical. The PCA of the group means with added within-group
dispersion has worked very well and might be considered as a model for further
development.
We could proceed as in the above PCA example by doing a PCO of D .Thenwe
would evaluate the group means to produce a map of the K group means by using
PCA. Finally, we would rotate all n samples so that the group means occupy the first
K 1 dimensions and show the within-group samples in this space. A problem with
this approach is that n may be very large, entailing a massive eigendecomposition. This
problem can be avoided by using the methodology described below, which requires the
whole of D but the eigenstructure of only a K × K matrix.
The method followed in constructing the biplots in Figure 5.33 may be generalized
to an analysis of distance (AoD) where the ddistances between all pairs of samples are
available in the form of an n × n matrix D ={−
1
2 d ij }
. Distances may be defined very
generally, though it is desirable that they be Euclidean embeddable as we assume here.
In addition, as above, we have grouping information available in G (see Section 4.2)
with associated partitioning of D conveniently written as
D 11
D 12
...
D 1 K
D 21
D 22
...
D 2 K
,
.
.
.
. . .
...
D K 1
D K 2
D KK
though there is no requirement in the following that the n samples be presented in the
implied group-by-group order.
Search WWH ::




Custom Search