Information Technology Reference
In-Depth Information
Gower and Legendre (1986) discuss various Euclidean embeddable distance measures.
The square root of the Manhattan distance,
p
d ij
=
1 | x ik x jk | ,
(5.5)
k
=
is one example; another is Clark's distance (Gower and Ngouenet, 2005), defined for
nonnegative values x ik , x jk by
x ik x jk
x ik + x jk
2
p
d ij
=
.
(5.6)
k
=
1
The nonlinear biplot (Gower and Harding, 1988) is a generalization of the PCA biplot,
providing for distance measures other than Pythagorean distance. Let X : n
p beamatrix
of n samples with observations on p variables and let d ij indicate the distance between
samples x i and x j . The matrix D
×
1
2 d ij }
={−
is used to define the double-centred matrix
B = I
n 11 D I
n 11 .
1
1
(5.7)
If B is positive semi-definite it can be expressed as B = Y Y . The rows of Y
provide coordinates that generate the distances d ij and therefore a sufficient condition
for Euclidean embeddability is that B be positive semi-definite. It is also a necessary
condition. A slight generalization is that a necessary and sufficient condition for Euclidean
embeddability is that B = ( I 1s ) D ( I s1 )
be positive semi-definite for any s such that
s 1
1and s D
0 . For a proof, see Gower (1982). The choice of s centres Y so that
=
=
s Y =
0 ; in particular, when s
n 1 1 , as in (5.7), we have 1 Y =
0 so the origin of
=
Y is at its centroid.
The matrix B defined by (5.7) is invariant with respect to orthogonal transformations
applied to Y expressing the well-known property of invariance of distances to orthogonal
rotations. In principal coordinate analysis, Y is given by the eigenvectors satisfying
BY = Y
scaled so that Y Y =
,thatis,theSVD B = V V provides Y = V
1
/
2 .
Then, because Y Y
is diagonal, Y is referred to principal axes through the centroid,
as in PCA. As a simple illustration of Euclidean embeddability, let us consider the
data matrix
=
64
48
42
22
a
b
c
d
X
=
(5.8)
graphically represented in Figure 5.3. If, instead of calculating ordinary Pythagorean
distances between the four samples, we calculate Clark's distance, we obtain the following
distances
{
d ij }
:
00
.
39
0
.
39
0
.
60
.
0
.
900
.
60
0
.
69
D =
0 . 39
0 . 60
0
0 . 33
0 . 60
0 . 69
0 . 33
0
Search WWH ::




Custom Search