A Novel Clustering Approach: Global Optimum Search with Enhanced Positioning - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

Two popular similarity metrics are correlation and Euclidean distance. The

latter is often popular, since it is intuitive, can be described by a familiar distance

function, and satisfies the triangular inequality. Clustering methods that employ

asymmetric distance measures [45, 53] are probably more difficult to intuitively

comprehend even though they may be highly suited to their intended applications.

The earliest work on clustering emphasized visual interpretations for the ease of

study, resulting in methods that utilize dendograms and color maps [11]. Other ex-

amples of clustering algorithms include: (a) Single-Link and Complete-Link Hi-

erarchical Clustering [36, 62], (b) K-Means Algorithm and its family of variants,

such as the K-Medians [30, 46, 49, 73, 74], (c) Reformulation Linearization-based

Clustering [1, 60], (d) Fuzzy Clustering [6, 15, 55, 59] , (e) Quality Cluster Algo-

rithm (QTClust) [32] , (f) Graph-Theoretic Clustering [26, 70, 72], (g) Mixture-

Resolving Clustering Method [13, 37], (h) Mode Seeking Algorithms [37], (i)

Artificial Neural Networks for Clustering [9, 42] such as the Self-Organizing Map

(SOM) [43] and a variant that combines the SOM with hierarchical clustering,

the Self-Organizing Tree Algorithm (SOTA) [31], (j) Information-Based Cluster-

ing [56, 61, 67] , (k) Stochastic Approaches [40, 48, 50].

In some instances such as gene expression data, clustering results could be

affected when there exists a number of experimental conditions in which gene

activity, even amongst those known to be closely associated, are uncorrelated.

For this reason, there has also been a number of biclustering algorithms that al-

lows for simultaneous clustering of the rows and columns of a matrix. Though

first described in [10, 29] were the first to apply it to gene expression data by

defining a score for each candidate bicluster and developing heuristics to solve

the resultant constrained optimization problem. Since then, much of the research

on biclustering has been directed towards biological applications. Additional ex-

amples of biclustering algorithms include: (a) Coupled two-way clustering [24],

(b) Iterative Signature Algorithm [5, 34], (c) Statistical-Algorithmic Method for

Bicluster Analysis (SAMBA) [66], (d) Plaid Model [44], (e) Spectral bicluster-

ing approaches such as the study assuming a hidden checkerboard-like structure

within the expression matrix reported in [41], (f) Probabilistic models such as that

reported in [58], (g) the method of [4] in defining a bicluster as an order preserving

submatrix. More recently, [8] present a fractional 0-1 programming approach to

identify selected features for consistent biclustering, and [14] develop a rigorous

biclustering approach OREO, which is based on the optimal reordering of rows

and columns in a data matrix using a network flow model.

In this paper, we present a novel Mixed-Integer Nonlinear Programming

(MINLP)-based clustering algorithm, the Global Optimal Search with Enhanced

Positioning (EP GOS Clust), which is robust yet intuitive [64, 65].

This algo-

Search WWH ::

Custom Search

Home