Probabilistic Distance Clustering: Algorithm and Applications - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

The gradient of d ( x , c )= ( x − c ) ,Q ( x − c ) 1 / 2 with respect to c is

∇ c ( x − c ) ,Q ( x − c ) 1 / 2 =

Proof.

)

( x − c ) ,Q ( x − c ) 1 / 2

Q (

x − c

−

(2.27)

Q ( x − c )

d ( x , c ) ,

−

assuming x

= c . Therefore if c 1 , c 2 do not coincide with any of the data points

x i ,wehave

p k ( x i ) 2

q k

( x i − c k )

d k ( x i , c k )

∇ c k f ( c 1 , c 2 )=

−

Q k

(2.28)

i =1

Setting the gradient equal to zero, “cancelling” the matrix Q k and the common

factor q k , and summing like terms, we get

x i = N

p k ( x i ) 2

d k ( x i , c k )

p k ( x i ) 2

d k ( x i , c k )

c k ,

i =1

proving (2.24) and (2.26). Substituting (2.7) in (2.26) then gives (2.25).

Note : The theorem holds also if a center coincides with a data point, if we interpret

∞

as 1 in (2.24).

Theorem 2.2 applies, in particular, to the Mahalanobis distance (2.4)

d ( x , c k )= ( x − c k ) T Σ − k ( x − c k ) ,

where Σ k is the (given or computed) covariance matrix of the cluster

∞

C k .

For the general case of K clusters it is convenient to use the probabilistic form

(2.26).

Corollary 2.1. Consider a function of K centers

d k ( x i , c k ) p k ( x i ) 2

q k

f ( c 1 , c 2 ,..., c K )=

(2.29)

k =1

i =1

an analog of (2.21). Then, under the hypotheses of Theorem 2.2, the minimizers

of f are

u k ( x i )

x i , with u k ( x i )= p k ( x i ) 2

c k =

d k ( x i , c k ) ,

(2.30)

t =1 u k ( x t )

i =1

for k =1 ,...,K.

Proof.

Same as the proof of Theorem 2.2.

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home