Biology Reference
In-Depth Information
5.2.1.2.
Mahalanobis Distance
Mahalanobis distance takes the different variances in different features and the
possible correlation between any two features into consideration.
The Maha-
lanobis distance between any two objects
x
i
and
x
j
is:
x
j
)
Σ
−
1
(
x
i
−
x
j
)]
1
/
2
d
M
(
x
i
,
x
j
)=[(
x
i
−
(5.3)
where Σ is the sample variance-covariance matrix.
For example, in Fig. 5.1,
x
C
1
)
Σ
−
1
x
C
1
)]
1
/
2
,
d
M
(
x
P
1
,
x
C
2
)=
x
P
1
−
d
M
(
x
P
1
,
x
C
1
)=
x
P
1
−
C
1
(
x
P
1
−
x
C
2
)
Σ
−
1
x
C
2
)]
1
/
2
,MatrixΣ
C
j
is the sample variance-covariance matrix of
objects assigned to cluster centered at
x
C
j
,
j
=1 and 2. The distance
d
M
(
x
P
1
,
x
C
1
)
<
d
M
(
x
P
1
,
x
C
2
) , so point
P
1
is assigned to cluster centered at
C
1
correctly. Similarly,
point
P
2
is assigned to cluster centered at
C
2
correctly.
Mahalanobis distance is very analogous to Hotelling's
T
2
statistic. Hotelling's
T
2
statistic is a very popularly used multivariate statistic to measure the weighted
distance between a high-dimension point to a population center [23]. We introduce
Hotelling's
T
2
statistic here briefly to help understand the Mahalanobis distance.
Hotelling's
T
2
statistic is calculated as:
C
2
(
x
P
1
−
T
2
=[(
x
i
−
x
)
S
−
1
(
x
i
−
x
)]
(5.4)
where
x
is the sample mean and
S
is the sample variance-covariance matrix. If
point
x
i
has high
T
2
statistic, it means with low probability,
x
i
is generated from an
underlying population whose probability density function (pdf) has sample mean
x
and sample variance-covariance matrix
S
. The readers can find the analogy be-
tween Eqs. 5.3 and 5.4. Assigning a point to a cluster whose center has the
minimum Mahalanobis distance to the point is just assigning a point to a popu-
lation where the point has the minimum Hotelling's
T
2
statistic, i.e., the highest
probability that the point is generated by that population.
Users can also find the similarity between Mahalanobis distance and the likeli-
hood value of the observation under the assumption that the observation is a sam-
ple from a multivariate normal distribution. For instance, in Fig. 5.1, the likelihood
value of an observation to a multivariate normal distribution, with sample mean
x
C
j
and sample variance-covariance matrix Σ
C
j
is:
L
j
(
x
)=
1
1
(2
π
)
p
/
2
|
Σ
C
j
|
1
/
2
exp
(
−
2
(
x
−
x
C
j
)
Σ
−
1
C
j
(
x
−
x
C
j
)),
j
=1, 2. Taking Eq. 5.3 into consideration, we get
1
1
2
d
M
(
x
,
x
C
j
))
,
L
j
(
x
)=
1
/
2
exp
(
−
(5.5)
(2
π
)
p
/
2
|
Σ
C
j
|
Term
in the right hand side of Eq. 5.5 is the determinant of matrix Σ
C
j
.
It is also called generalized variance [17]. From Eqs. 5.3 and
|
Σ
C
j
|
5.5, we can see
Search WWH ::
Custom Search