A Novel Clustering Approach: Global Optimum Search with Enhanced Positioning - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

min i =1 c j =1 s k =1 w ij ( a ik −

z jk ) 2

(Problem 1)

c j =1 w ij =1 ,

s.t.

∀

i =1 ,.....,n

w ij are binary variables ,z jk are continuous variables

There are two sets of variables in the problem, w ij and z jk . While the bounds

of w ij are clearly 0 and 1, that of z jk is obtained by observing the range of a ik

values.

z jk = min [ a ik ] ,

∀

k =1 ,.....,s

z jk = max [ a ik ] ,

∀

k =1 ,.....,s

The pre-clustering work suggests that some of the genes need only be re-

stricted to some number of known clusters, since it can be determined (for instance

by distance and correlation metrics) that certain genes are exceedingly dissimilar

from some of the pre-clusters and thus have virtually zero probability of being

clustered there. This restriction can be described by introducing an additional bi-

nary parameter suit ij . A data point deemed to belong uniquely to just one cluster

will only have suit ij =1for only one value of j and zero for the others, whereas a

data point restricted to a few clusters will have suit ij =1for only those clusters.

This reduces the computational demands of the problem. The introduction of the

suit ij parameters also obviates the need for constraints that prevent the redundant

re-indexing of clusters.

Together with the first-order optimality condition (FOC) (i.e. the vector dis-

tance sum of all genes within a cluster to the cluster center should be intuitively

zero), the formulation becomes:

min i =1 s k =1 a ik − i =1 c j =1 s k =1 ( suit ij ( a ik w ij z jk ) (Problem 2)

s.t.

( suit ij )( z jk i =1 w ij − i =1 a ik w ij )=0 ,

∀

c j =1 ( suit ij ) w ij =1 ,

∀

≤ i =1 ( suit ij ) w ij ≤

−

c +1

w ij =0

−

1 ,

∀

z jk ≤

z jk ,

z jk ≤

∀

The first set of constraints are the FOC, the second demand that each gene can

belong to only one cluster, and the third state that there is at least one and no more

than (n-c+1) data points in a cluster. Note also that the i =1 s k =1 a ik term in

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home