Biology Reference
In-Depth Information
inter-cluster dissimilarity, often considered two key tenets of clustering. Further-
more, our algorithm can predict the optimal number of clusters, and the biological
coherence of the predicted clusters is analyzed through gene ontology.
16.1. Introduction
The aim of cluster analysis is to establish a set of clusters such that the data points
in a cluster are more similar to one another than they are to those in other clusters.
The clustering problem is old, can be traced back to Aristotle, and has already
been studied quite extensively by 18th century naturalists such as Buffon, Cuvier,
and Linne [28]. Since then, clustering has been used in many disciplines, such as
market research, social network analysis, and geology, thus reflecting its broad ap-
peal and utility as a key step in exploratory data analysis [37]. In market research
for instance, cluster analysis is widely used when working with multivariate data
from surveys and test panels. Market researchers use cluster analysis methods to
segment and determine target markets, and position new products. Cluster analy-
sis is also used in the service of market approaches to the establishment of business
enterprise value. [38] addresses the potential role and utility of cluster analysis in
transfer pricing practices. Given the importance of clustering, a substantial num-
ber of topics, such as [17, 29, 36], as well as review papers, such as [71] have been
published on this subject.
In biology, clustering provides insights into transcriptional networks, physio-
logical responses, gene identification, genome organization, and protein structure.
Genome-wide measurements of mRNA expression levels have provided an effi-
cient and comprehensive means of gathering information on genetic functions and
transcriptional networks. However, extracting useful information from the result-
ing large data sets first involves organizing genes by their pattern and/or intensity
of expression in order to define those genes that are co-regulated. Such infor-
mation provides a basis for extracting regulatory motifs for transcription factors
driving the diverse expression patterns, allowing assembly of predictive transcrip-
tional networks [3]. This information also provides insights into the functions
of unknown genes, since functionally related genes are often co-regulated [68].
Furthermore, clustered array data provides identification of distinct categories of
otherwise indistinguishable cell types, which can have profound implications in
processes such as disease progression [63]. In sequence analysis, clustering is
used to group homologous sequences into gene families. Examining characteristic
DNA fragments helps in the identification of gene structures and reading frames.
In protein structure prediction, clustering the ensemble of low energy conformers
is used to identify the best possible protein structures.
Search WWH ::




Custom Search