Multidimensional scaling and nonlinear biplots - Understanding Biplots

Information Technology Reference

In-Depth Information

Gower and Legendre (1986) discuss various Euclidean embeddable distance measures.

The square root of the Manhattan distance,

d ij

1 | x ik − x jk | ,

(5.5)

is one example; another is Clark's distance (Gower and Ngouenet, 2005), defined for

nonnegative values x ik , x jk by

x ik − x jk

x ik + x jk

d ij

(5.6)

The nonlinear biplot (Gower and Harding, 1988) is a generalization of the PCA biplot,

providing for distance measures other than Pythagorean distance. Let X : n

p beamatrix

of n samples with observations on p variables and let d ij indicate the distance between

samples x i and x j . The matrix D

2 d ij }

={−

is used to define the double-centred matrix

B = I −

n 11 D I −

n 11 .

(5.7)

If B is positive semi-definite it can be expressed as B = Y ∗ Y ∗ . The rows of Y ∗

provide coordinates that generate the distances d ij and therefore a sufficient condition

for Euclidean embeddability is that B be positive semi-definite. It is also a necessary

condition. A slight generalization is that a necessary and sufficient condition for Euclidean

embeddability is that B = ( I − 1s ) D ( I − s1 )

be positive semi-definite for any s such that

s 1

1and s D

0 . For a proof, see Gower (1982). The choice of s centres Y ∗ so that

s Y ∗ =

0 ; in particular, when s

n − 1 1 , as in (5.7), we have 1 Y ∗ =

0 so the origin of

Y ∗ is at its centroid.

The matrix B defined by (5.7) is invariant with respect to orthogonal transformations

applied to Y ∗ expressing the well-known property of invariance of distances to orthogonal

rotations. In principal coordinate analysis, Y ∗ is given by the eigenvectors satisfying

BY = Y

scaled so that Y Y =

,thatis,theSVD B = V V provides Y = V

2 .

Then, because Y Y

is diagonal, Y is referred to principal axes through the centroid,

as in PCA. As a simple illustration of Euclidean embeddability, let us consider the

data matrix

(5.8)

graphically represented in Figure 5.3. If, instead of calculating ordinary Pythagorean

distances between the four samples, we calculate Clark's distance, we obtain the following

distances

{

d ij }

900

D ∗ =

0 . 39

0 . 60

0 . 33

0 . 60

0 . 69

0 . 33

Understanding Biplots

Search WWH ::

Custom Search

Home