Biology Reference
In-Depth Information
Note : Formula (2.30) is an optimality condition for the centers c k , expressing
them as convex combinations of the data points x i , with weights u k ( x i ) depend-
ingonthecenters c k . It is used iteratively in Step 3 of Algorithm 1 below to update
the centers, and is an extension to several facilities of the well-known Weiszfeld
iteration for facility location, see [13, 17]. This formula, and the corresponding
formulas (2.20) for the cluster sizes, are applied in [9] for solving multi-facility
location problems, subject to capacity constraints.
2.2.6. The Centers and the Joint Distance Function
The centers obtained in Theorem 2.2 are stationary points for the joint distance
function (2.16), written as a function of the cluster centers c 1 , c 2 ,
d 1 ( x i , c 1 ) d 2 ( x i , c 2 )
q 1 q 2
d 1 ( x i , c 1 )
q 1
N
F ( c 1 , c 2 )=
.
(2.31)
+ d 2 ( x i , c 2 )
q 2
i =1
Theorem 2.3. Let the distances d k ( x i , c k ) in (2.31) be elliptic. Then the station-
ary points of the function F are c 1 , c 2 given by (2.24)-(2.26).
Proof.
Using (2.27), and simplifying, we derive
d 2 ( x i ) 2
q 2
Q 1 ( x i
c 1 )
d 1 ( x i )
N
c 1 F ( c 1 , c 2 )=
(2.32)
d 1 ( x i )
q 1
2
+ d 2 ( x i )
q 2
i =1
Setting
c 1 F ( c 1 , c 2 ) equal zero, and summing like terms, we obtain the center
c 1 as in (2.24)-(2.26). The statements about c 2 are proved similarly.
2.3. The PDQ Algorithm
The above ideas are implemented in an algorithm for the unsupervised clustering
of data. We call it the PDQ Algorithm ( P for probability, D for distance and Q
for the cluster sizes.)
For
simplicity,
we
describe
the
algorithm
for
the
case
of
2
clusters.
Search WWH ::




Custom Search