Biology Reference
In-Depth Information
where
·
is a norm. In particular, the Mahalanobis distance with the norm,
u , Σ 1
1 / 2 ,
u
=
u
(2.2)
where Σ is the covariance matrix of the data involved.
2 with N = 1100 data points is shown in Fig. 2.1.
The data on the left was simulated from a normal distribution N (
Example 2.1. A data set in
R
µ
, Σ), with:
µ 1 =(2 , 0) , Σ 1 = 0 . 0005 0
, (100 points) ,
0 . 05
and the data on the right consist of 1000 points, simulated in a circle of diameter
1 centered at
µ 2 =(3 , 0) according to a radially symmetric distribution with
1
2 .
This data will serve as illustration in Examples 2.2-2.3 below.
Prob
{ x µ 2
r
}
=2 r , 0
r
0.8
0.6
0.4
0.2
0
−0.2
−0.4
−0.6
−0.8
1.8
2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
R 2
Fig. 2.1.
A data set in
In d-clustering each point is typically assigned to the cluster with the near-
est center. After each assignment, the cluster centers may change, requiring a
re-classification of the data points. A d-clustering algorithm will therefore it-
erate between centers and re-assignments. The best known such method is the
k -means clustering algorithm , see Hartigan [5].
Search WWH ::




Custom Search