A Novel Clustering Approach: Global Optimum Search with Enhanced Positioning - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

constraint for cluster size can unnecessarily increase the problem size. Hence, we

can further tighten the constraint for cluster size as a plausible strategy for further

expediting the clustering of even larger data sets.

Indexing the new cluster size ranges by l, for l = 1,.,d, we introduce a new

binary variable w jl , which equals one if cluster j belongs to size class l, and zero

if otherwise. The additional constraints are then formulated as follows:

d

l =1 w jl =1 ,

∀

j

(16.9)

≤ c j =1 w jl ≤

n c −

n c + ,

∀

l

(16.10)

l =1 d l,min ≤ i =1 w ij ≤ l =1 d l,max ,

j (16.11)

The first set of constraints allows each cluster into only one size class. The

second constraint restricts the number of clusters allowable in each class. The

parameter ε is judiciously picked to allow a reasonable range over the number of

clusters in each class, for instance 10%. Finally, the third constraint bounds the

size of the clusters allowed in each class. These additional constraints also involve

the variable w ij but not z jk ; hence they are all included into the master problem.

∀

16.4. Conclusion

In our study, we propose a novel clustering algorithm (EP GOS Clust) based on

a Mixed-Integer Nonlinear Programming (MINLP) formulation. We test our pro-

posed algorithm on a substantially large dataset of gene expression patterns from

the yeast Saccharomyces Cerevisiae, and show that our method compares favor-

ably (if not outperforms) with other clustering methods in identifying data points

that are the most similar to one another as well as identifying clusters that are the

most dissimilar to one another. We also show that the EP GOS Clust is capable

of uncovering tightly-correlated clusters. Given the nature of the test datasets, we

too show that the EP GOS Clust does well in uncovering clusters with good bi-

ological coherence. In addition, we demonstrate the utility of the pre-clustering

procedure and a methodology that works in concert with the algorithm itself to

predict the optimal number of clusters. For consistency, we repeated our study

on other DNA microarray datasets based on the various glucose signaling path-

ways in the yeast Saccharomyces Cerevisiae (other results not reported here) and

obtained similar result trends.

16.5. Computational Resources

All optimization formulations are written in GAMS (General Algebraic Modeling

System) [7] and solved using the commercial solver CPLEX 8.0.

GAMS is a

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home