Information Technology Reference
In-Depth Information
We may evaluate an average
K
×
K
matrix,
N
−
1
G
DGN
−
1
,
D
=
(5.21)
from which we may derive
1
2
hk
=−
(
D
kk
+
D
hh
−
2
D
hk
).
(5.22)
Gower and Hand (1996, p. 249) show that the quantity in parentheses is the squared
distance between the centroids of groups
h
and
k
. Defining
g
k
to be the
k
th column
of
G
,then
1
n
h
n
k
g
h
Dg
k
D
hk
=
gives the average of the
ddistances
between the members of the
h
th and
k
th groups;
when
h
=
k
the zero diagonals and repeated symmetric values are all included.
The
hk
in (5.22) may be assembled into a
ddistance
matrix
={
hk
}
:
K
×
K
which may be approximated by any method of multidimensional scaling (see above) to
give a map of the group means analogous to the map of canonical means given by CVA.
In the following we shall use PCO. This completes the between-group map. We show in
Section 5.5 that when we restrict the distances to be Pythagorean, a PCO of
results
in a PCA of the group means, as illustrated for the
Ocotea
data in Figure 5.33. If in
D
we defined the Mahalanobis distance between all pairs of samples, rather than just the
sample means, PCO would recover the CVA of the canonical means. With other choices
of embeddable distance and MDS, different analyses and representations will ensue.
So far, we have been concerned with between-group structure but when PCO is used
to represent
, it is relatively easy to add points representing the individual samples. To
do this we repeatedly use the technique described in Section 5.4.1 for adding a point P
to a PCO. This requires the distances from the new point to all the
n
original points. We
assume that these are given in a column vector
d
:
n
×
1ofelements
d
=
(
d
1
,
d
2
,
,
d
K
)
,
...
(5.23)
where
d
k
is a vector of size
n
k
giving the
ddistances
of the new point from the samples
in the
k
th group. Note that
d
may represent a completely new sample, but often it will
be one of the columns of
D
. Denoting the centroids of the groups by G
1
,G
2
,
...
,G
K
,
gives the
ddistances
between all pairs of centroids. We need the squared distances of
P from every centroid and the
ddistances
of every centroid from the overall centroid G,
say. The latter is given by
(
1
1
)
1
/
K
2
−
2
1
/
K
.
(5.24)
The squared distance of P from G
k
is given by
2
2
k
n
k
1
d
k
,
δ
=
D
kk
−
(5.25)
which may be assembled into a
ddistance
vector
δ
=
(
−
1
2
1
2
1
2
2
2
2
K
δ
1
,
−
δ
2
,
...
,
−
δ
)
.