Biology Reference
In-Depth Information
where
·
is a norm. In particular, the
Mahalanobis distance
with the norm,
u
,
Σ
−
1
1
/
2
,
u
=
u
(2.2)
where Σ is the covariance matrix of the data involved.
2
with
N
= 1100 data points is shown in Fig. 2.1.
The data on the left was simulated from a normal distribution
N
(
Example 2.1.
A data set in
R
µ
,
Σ), with:
µ
1
=(2
,
0)
,
Σ
1
=
0
.
0005 0
,
(100 points)
,
0
.
05
and the data on the right consist of 1000 points, simulated in a circle of diameter
1 centered at
µ
2
=(3
,
0) according to a radially symmetric distribution with
1
2
.
This data will serve as illustration in Examples 2.2-2.3 below.
Prob
{
x
−
µ
2
≤
r
}
=2
r
, 0
≤
r
≤
0.8
0.6
0.4
0.2
0
−0.2
−0.4
−0.6
−0.8
1.8
2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
R
2
Fig. 2.1.
A data set in
In d-clustering each point is typically assigned to the cluster with the near-
est center. After each assignment, the cluster centers may change, requiring a
re-classification of the data points. A d-clustering algorithm will therefore it-
erate between centers and re-assignments. The best known such method is the
k
-means clustering algorithm
, see Hartigan [5].
Search WWH ::
Custom Search