Biology Reference
In-Depth Information
Two popular similarity metrics are correlation and Euclidean distance. The
latter is often popular, since it is intuitive, can be described by a familiar distance
function, and satisfies the triangular inequality. Clustering methods that employ
asymmetric distance measures [45, 53] are probably more difficult to intuitively
comprehend even though they may be highly suited to their intended applications.
The earliest work on clustering emphasized visual interpretations for the ease of
study, resulting in methods that utilize dendograms and color maps [11]. Other ex-
amples of clustering algorithms include: (a) Single-Link and Complete-Link Hi-
erarchical Clustering [36, 62], (b) K-Means Algorithm and its family of variants,
such as the K-Medians [30, 46, 49, 73, 74], (c) Reformulation Linearization-based
Clustering [1, 60], (d) Fuzzy Clustering [6, 15, 55, 59] , (e) Quality Cluster Algo-
rithm (QTClust) [32] , (f) Graph-Theoretic Clustering [26, 70, 72], (g) Mixture-
Resolving Clustering Method [13, 37], (h) Mode Seeking Algorithms [37], (i)
Artificial Neural Networks for Clustering [9, 42] such as the Self-Organizing Map
(SOM) [43] and a variant that combines the SOM with hierarchical clustering,
the Self-Organizing Tree Algorithm (SOTA) [31], (j) Information-Based Cluster-
ing [56, 61, 67] , (k) Stochastic Approaches [40, 48, 50].
In some instances such as gene expression data, clustering results could be
affected when there exists a number of experimental conditions in which gene
activity, even amongst those known to be closely associated, are uncorrelated.
For this reason, there has also been a number of biclustering algorithms that al-
lows for simultaneous clustering of the rows and columns of a matrix. Though
first described in [10, 29] were the first to apply it to gene expression data by
defining a score for each candidate bicluster and developing heuristics to solve
the resultant constrained optimization problem. Since then, much of the research
on biclustering has been directed towards biological applications. Additional ex-
amples of biclustering algorithms include: (a) Coupled two-way clustering [24],
(b) Iterative Signature Algorithm [5, 34], (c) Statistical-Algorithmic Method for
Bicluster Analysis (SAMBA) [66], (d) Plaid Model [44], (e) Spectral bicluster-
ing approaches such as the study assuming a hidden checkerboard-like structure
within the expression matrix reported in [41], (f) Probabilistic models such as that
reported in [58], (g) the method of [4] in defining a bicluster as an order preserving
submatrix. More recently, [8] present a fractional 0-1 programming approach to
identify selected features for consistent biclustering, and [14] develop a rigorous
biclustering approach OREO, which is based on the optimal reordering of rows
and columns in a data matrix using a network flow model.
In this paper, we present a novel Mixed-Integer Nonlinear Programming
(MINLP)-based clustering algorithm, the Global Optimal Search with Enhanced
Positioning (EP GOS Clust), which is robust yet intuitive [64, 65].
This algo-
Search WWH ::




Custom Search