Information Technology Reference
In-Depth Information
We may evaluate an average K × K matrix,
N 1 G DGN 1 ,
D
=
(5.21)
from which we may derive
1
2
hk =−
( D kk + D hh
2 D hk ).
(5.22)
Gower and Hand (1996, p. 249) show that the quantity in parentheses is the squared
distance between the centroids of groups h and k . Defining g k to be the k th column
of G ,then
1
n h n k g h Dg k
D hk
=
gives the average of the ddistances between the members of the h th and k th groups;
when h = k the zero diagonals and repeated symmetric values are all included.
The hk in (5.22) may be assembled into a ddistance matrix ={ hk } : K × K
which may be approximated by any method of multidimensional scaling (see above) to
give a map of the group means analogous to the map of canonical means given by CVA.
In the following we shall use PCO. This completes the between-group map. We show in
Section 5.5 that when we restrict the distances to be Pythagorean, a PCO of
results
in a PCA of the group means, as illustrated for the Ocotea data in Figure 5.33. If in D
we defined the Mahalanobis distance between all pairs of samples, rather than just the
sample means, PCO would recover the CVA of the canonical means. With other choices
of embeddable distance and MDS, different analyses and representations will ensue.
So far, we have been concerned with between-group structure but when PCO is used
to represent , it is relatively easy to add points representing the individual samples. To
do this we repeatedly use the technique described in Section 5.4.1 for adding a point P
to a PCO. This requires the distances from the new point to all the n original points. We
assume that these are given in a column vector d : n × 1ofelements
d = (
d 1 , d 2 ,
, d K ) ,
...
(5.23)
where d k is a vector of size n k giving the ddistances of the new point from the samples
in the k th group. Note that d may represent a completely new sample, but often it will
be one of the columns of D . Denoting the centroids of the groups by G 1 ,G 2 , ... ,G K ,
gives the ddistances between all pairs of centroids. We need the squared distances of
P from every centroid and the ddistances of every centroid from the overall centroid G,
say. The latter is given by
( 1 1 ) 1 / K 2
2 1 / K .
(5.24)
The squared distance of P from G k is given by
2
2
k
n k 1 d k ,
δ
= D kk
(5.25)
which may be assembled into a ddistance vector δ = (
1
2
1
2
1
2
2
2
2
K
δ
1 ,
δ
2 , ... ,
δ
) .
Search WWH ::




Custom Search