Biology Reference
In-Depth Information
Algorithm 1. The PDQ Algorithm.
Initialization :
i n ta t
with N points,
any two centers c 1 , c 2 ,
any two cluster sizes q 1 ,q 2 ,q 1 + q 2 = N ,
> 0
D
Iteration :
Step 1
compute distances from c 1 , c 2 for all x ∈D
+
1 , q
+
2 (using (2.20))
Step 2
update the cluster sizes q
+
1 , c
+
2 (using (2.24)-(2.25))
Step 3
update the centers c
+
1
+
2
Step 4
if
c
c 1
+
c
c 2
< stop
return to step 1
The algorithm iterates between the cluster size estimates (2.20), the cluster
centers , (2.24), expressed as stationary points for the JDF (2.21), and the dis-
tances of the data points to these centers. The cluster probabilities , (2.7), are not
used explicitly.
Remarks
(a) The distances used in Step 1 can be Euclidean or elliptic (the formulas
(2.24)-(2.25) are valid in both cases.)
(b) In particular, if the Mahalanobis distance (2.2)
d ( x , c k )= ( x c k ) T Σ k ( x c k )
is used, the covariance matrix Σ k of the k th -cluster, can be estimated at
each iteration by
i =1
N
u k ( x i )( x i c k )( x i c k ) T
Σ k =
(2.33)
i =1
N
u k ( x i )
with u k ( x i ) given by (2.25).
(c) If the cluster sizes q 1 ,q 2 are known, they are used as the initial estimates
and are not updated thereafter, in other words, Step 2 is absent.
Search WWH ::




Custom Search