Probabilistic Distance Clustering: Algorithm and Applications - Clustering Challenges in Biological Network - page 39

Biology Reference

In-Depth Information

Algorithm 1. The PDQ Algorithm.

Initialization :

i n ta t

with N points,

any two centers c 1 , c 2 ,

any two cluster sizes q 1 ,q 2 ,q 1 + q 2 = N ,

> 0

D

Iteration :

Step 1

compute distances from c 1 , c 2 for all x ∈D

+

1 , q

+

2 (using (2.20))

Step 2

update the cluster sizes q

+

1 , c

+

2 (using (2.24)-(2.25))

Step 3

update the centers c

+

1

+

2

Step 4

if

c

− c 1

+

c

− c 2

< stop

return to step 1

The algorithm iterates between the cluster size estimates (2.20), the cluster

centers , (2.24), expressed as stationary points for the JDF (2.21), and the dis-

tances of the data points to these centers. The cluster probabilities , (2.7), are not

used explicitly.

Remarks

(a) The distances used in Step 1 can be Euclidean or elliptic (the formulas

(2.24)-(2.25) are valid in both cases.)

(b) In particular, if the Mahalanobis distance (2.2)

d ( x , c k )= ( x − c k ) T Σ − k ( x − c k )

is used, the covariance matrix Σ k of the k th -cluster, can be estimated at

each iteration by

i =1

N

u k ( x i )( x i − c k )( x i − c k ) T

Σ k =

(2.33)

i =1

N

u k ( x i )

with u k ( x i ) given by (2.25).

(c) If the cluster sizes q 1 ,q 2 are known, they are used as the initial estimates

and are not updated thereafter, in other words, Step 2 is absent.

Next Page

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home