Multidimensional scaling and nonlinear biplots - Understanding Biplots

Information Technology Reference

In-Depth Information

We may evaluate an average K × K matrix,

N − 1 G DGN − 1 ,

(5.21)

from which we may derive

hk =−

( D kk + D hh −

2 D hk ).

(5.22)

Gower and Hand (1996, p. 249) show that the quantity in parentheses is the squared

distance between the centroids of groups h and k . Defining g k to be the k th column

of G ,then

n h n k g h Dg k

D hk

gives the average of the ddistances between the members of the h th and k th groups;

when h = k the zero diagonals and repeated symmetric values are all included.

The hk in (5.22) may be assembled into a ddistance matrix ={ hk } : K × K

which may be approximated by any method of multidimensional scaling (see above) to

give a map of the group means analogous to the map of canonical means given by CVA.

In the following we shall use PCO. This completes the between-group map. We show in

Section 5.5 that when we restrict the distances to be Pythagorean, a PCO of

results

in a PCA of the group means, as illustrated for the Ocotea data in Figure 5.33. If in D

we defined the Mahalanobis distance between all pairs of samples, rather than just the

sample means, PCO would recover the CVA of the canonical means. With other choices

of embeddable distance and MDS, different analyses and representations will ensue.

So far, we have been concerned with between-group structure but when PCO is used

to represent , it is relatively easy to add points representing the individual samples. To

do this we repeatedly use the technique described in Section 5.4.1 for adding a point P

to a PCO. This requires the distances from the new point to all the n original points. We

assume that these are given in a column vector d : n × 1ofelements

d = (

d 1 , d 2 ,

, d K ) ,

...

(5.23)

where d k is a vector of size n k giving the ddistances of the new point from the samples

in the k th group. Note that d may represent a completely new sample, but often it will

be one of the columns of D . Denoting the centroids of the groups by G 1 ,G 2 , ... ,G K ,

gives the ddistances between all pairs of centroids. We need the squared distances of

P from every centroid and the ddistances of every centroid from the overall centroid G,

say. The latter is given by

( 1 1 ) 1 / K 2

− 2 1 / K .

(5.24)

The squared distance of P from G k is given by

n k 1 d k ,

= D kk −

(5.25)

which may be assembled into a ddistance vector δ = ( −

1 , −

2 , ... , −

) .

Understanding Biplots

Search WWH ::

Custom Search

Home