Biology Reference
In-Depth Information
Algorithm 1.
The PDQ Algorithm.
Initialization
:
i n ta t
with
N
points,
any two centers
c
1
,
c
2
,
any two cluster sizes
q
1
,q
2
,q
1
+
q
2
=
N
,
>
0
D
Iteration
:
Step 1
compute
distances from
c
1
,
c
2
for all
x
∈D
+
1
,
q
+
2
(using (2.20))
Step 2
update
the cluster sizes
q
+
1
,
c
+
2
(using (2.24)-(2.25))
Step 3
update
the centers
c
+
1
+
2
Step 4
if
c
−
c
1
+
c
−
c
2
<
stop
return
to step 1
The algorithm iterates between the cluster size estimates (2.20), the cluster
centers
, (2.24), expressed as stationary points for the JDF (2.21), and the
dis-
tances
of the data points to these centers. The cluster
probabilities
, (2.7), are not
used explicitly.
Remarks
(a) The distances used in Step 1 can be Euclidean or elliptic (the formulas
(2.24)-(2.25) are valid in both cases.)
(b) In particular, if the Mahalanobis distance (2.2)
d
(
x
,
c
k
)=
(
x
−
c
k
)
T
Σ
−
k
(
x
−
c
k
)
is used, the covariance matrix Σ
k
of the
k
th
-cluster, can be estimated at
each iteration by
i
=1
N
u
k
(
x
i
)(
x
i
−
c
k
)(
x
i
−
c
k
)
T
Σ
k
=
(2.33)
i
=1
N
u
k
(
x
i
)
with
u
k
(
x
i
) given by (2.25).
(c) If the cluster sizes
q
1
,q
2
are known, they are used as the initial estimates
and are not updated thereafter, in other words, Step 2 is absent.
Search WWH ::
Custom Search