Probabilistic Distance Clustering: Algorithm and Applications - Clustering Challenges in Biological Network - page 41

Biology Reference

In-Depth Information

Algorithm 2. The EM Method.

Initialization :

i n ta t

with N points,

initial guesses for the parameters c 1 , c 2 , Σ 1 , Σ 2 , π

D

Iteration :

Step 1:

For all x i ∈D

compute the “responsibilities” :

πφ 1 ( x i )

πφ 1 ( x i )+(1

p 1 ( x i )=

π ) φ 2 ( x i ) ,

−

p 2 ( x i )=1

−

p 1 ( x i ) .

Step 2

update the centers and covariances:

p k ( x i )

j =1 p k ( x j )

i =1

N

c k =

x i ,

p k ( x i )

j =1 p k ( x j )

( x i −

i =1

N

Σ k =

c k ) T , k =1 , 2

c k )( x i −

Step 3

update the mixing probabilities (weights):

π = i =1 p 1 ( x i )

N

Step 4

stop or return to step 1

Remarks

(a) The “responsibilities” in Step 1 correspond to the cluster membership

probabilities in Algorithm 1.

(b) Step 1 requires, for each data point, both the Mahalanobis distance (2.2),

and the evaluation of the density (2.35).

(c) Step 2 is computationally similar to Step 3 of Algorithm 1.

(d) The stopping rule (Step 4) is again the convergence of centers as in Al-

gorithm 1.

2.4.1. A Comparison of the PDQ Algorithm (Algorithm 1) and the EM

Method (Algorithm 2)

(a) The EM Algorithm is based on maximum likelihood, and therefore de-

pends on the density functions in the mix, requiring different compu-

tations for different densities.

The PDQ Algorithm is parameter free,

Next Page

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home