Information Technology Reference
In-Depth Information
Gower and Legendre (1986) discuss various Euclidean embeddable distance measures.
The square root of the Manhattan distance,
p
d
ij
=
1
|
x
ik
−
x
jk
|
,
(5.5)
k
=
is one example; another is Clark's distance (Gower and Ngouenet, 2005), defined for
nonnegative values
x
ik
,
x
jk
by
x
ik
−
x
jk
x
ik
+
x
jk
2
p
d
ij
=
.
(5.6)
k
=
1
The nonlinear biplot (Gower and Harding, 1988) is a generalization of the PCA biplot,
providing for distance measures other than Pythagorean distance. Let
X
:
n
p
beamatrix
of
n
samples with observations on
p
variables and let
d
ij
indicate the distance between
samples
x
i
and
x
j
. The matrix
D
×
1
2
d
ij
}
={−
is used to define the double-centred matrix
B
=
I
−
n
11
D
I
−
n
11
.
1
1
(5.7)
If
B
is positive semi-definite it can be expressed as
B
=
Y
∗
Y
∗
. The rows of
Y
∗
provide coordinates that generate the distances
d
ij
and therefore a sufficient condition
for Euclidean embeddability is that
B
be positive semi-definite. It is also a necessary
condition. A slight generalization is that a necessary and sufficient condition for Euclidean
embeddability is that
B
=
(
I
−
1s
)
D
(
I
−
s1
)
be positive semi-definite for any
s
such that
s
1
1and
s
D
0
. For a proof, see Gower (1982). The choice of
s
centres
Y
∗
so that
=
=
s
Y
∗
=
0
; in particular, when
s
n
−
1
1
, as in (5.7), we have
1
Y
∗
=
0
so the origin of
=
Y
∗
is at its centroid.
The matrix
B
defined by (5.7) is invariant with respect to orthogonal transformations
applied to
Y
∗
expressing the well-known property of invariance of distances to orthogonal
rotations. In principal coordinate analysis,
Y
∗
is given by the eigenvectors satisfying
BY
=
Y
scaled so that
Y
Y
=
,thatis,theSVD
B
=
V
V
provides
Y
=
V
1
/
2
.
Then, because
Y
Y
is diagonal,
Y
is referred to principal axes through the centroid,
as in PCA. As a simple illustration of Euclidean embeddability, let us consider the
data matrix
=
64
48
42
22
a
b
c
d
X
=
(5.8)
graphically represented in Figure 5.3. If, instead of calculating ordinary Pythagorean
distances between the four samples, we calculate Clark's distance, we obtain the following
distances
{
d
ij
}
:
00
.
39
0
.
39
0
.
60
.
0
.
900
.
60
0
.
69
D
∗
=
0
.
39
0
.
60
0
0
.
33
0
.
60
0
.
69
0
.
33
0