Information Technology Reference
In-Depth Information
1.0
0.8
0.6
0.4
c
0.2
b
a
−
1.0
−
0.8
−
0.6
−
0.4
−
0.2
0.2
0.4
0.6
0.8
1.0
−
0.2
−
0.4
−
0.6
−
0.8
−
1.0
Figure 5.5
Euclidean space for samples
a
,
b
and
c
.
To embed our data matrix,
X
, into a Euclidean space, we need to find a coordinate
representation that will reproduce the matrix
D
∗
when calculating the inter-sample
Pythagorean distances. A simple way to accomplish this is as follows: we construct
a two-dimensional Euclidean space and place
a
at the origin. Sample
b
now has to
beadistanceof0.39awayfrom
a
. This is accomplished by placing
b
at coordinates
(0.39, 0) as illustrated in Figure 5.4.
Next,
c
has to be embedded in the Euclidean space such that it is at a distance
0.39 from
a
and0.60from
b
. Finding the intersection (strictly one of the two possible
intersections) of the circles with radii 0.39 and 0.60 respectively,
c
is placed in the
two-dimensional Euclidean space as shown in Figure 5.5.
To embed
d
into a Euclidean space such that it has distances 0.60, 0.69 and 0.33 from
a
,
b
and
c
respectively, it is necessary to turn to a three-dimensional Euclidean space.
The embedding of
d
is illustrated in Figure 5.6. In general,
n
samples are embedded
in an (
n
1)-dimensional Euclidean space. The process described here can be contin-
ued further, but cannot be visually represented in dimensions higher than 3. A more
straightforward algebraic way of finding a representation
Y
embedded in the Euclidean
space is to solve the eigenvector problem
B
=
YY
imposing the scaling
Y
Y
=
.This
is known as
principal coordinate analysis
(PCO) or
classical scaling
.Ofcourse,if
B
is not positive semi-definite, it will not be possible to find such a configuration
Y
.As
an example, if the distances between
a
,
b
and
c
were such that the circles centred
at
a
and
b
did not intersect, there would be no Euclidean representation where
c
can
be embedded.
−