Biology Reference
In-Depth Information
rithm is significant in that it is able to progressively identify and weed out outlier
data points. Also, our algorithm contains a convenient method to predict the op-
timal number of clusters. We compare our algorithm with several approaches
commonly used in clustering biological microarray data, namely K-methods, QT-
Clust., SOM, and SOTA. We use two assessment criteria to assess the results: the
intra-cluster and inter-cluster error sums. We also examine the difference between
the two error sums. In an optimal cluster configuration, the intra-cluster error sum
is to be minimized and the inter-cluster error sum to be maximized. In this respect,
we show that our proposed algorithm compares favorably. We also incorporate a
methodology to predict the optimal number of clusters. In addition, in view of the
context of the particular test dataset used, we compare the strength of biological
coherence uncovered by the various approaches using Gene Ontology resources,
and also the level of correlation between data points with the same cluster. We
base our comparative study on actual DNA microarray data, though our algorithm
can be readily utilized for data from other applications.
16.2. Methods
16.2.1. Experimental Data
For the clustering studies described in this report, we used experimental microar-
ray data derived from a study in the role of the Ras/protein kinase A pathway
(PKA) on glucose signaling in yeast [69]. These experiments analyzed mRNA
levels in samples extracted from cells at various times following stimulation by
glucose or following activation of either Ras2 or Gpa2, which are small GT-
Pases involved in the metabolic and transcriptional response of yeast cells to glu-
cose [57]. These experiments were performed in wild type cells and cells defective
in PKA activity. Clustering these microarray data has proven to be a critical step in
using the data to develop a predictive model of a topological map of the signaling
network surrounding the Ras/PKA pathway [47].
Levels of RNA for each of the 6237 yeast genes in each of the RNA samples
from the above experiments were measured using Affymetrix microarray chips
and analyzed by the Affymetrix software. Each of the eight test and control ex-
periments consisted of four time points over a hour period, yielding 32 data points
for each of the 6237 genes. We used the Affymetrix MicroArray Suite 5.0, which
analyzes the consensus of intensities of hybridization of an RNA to the collection
of perfect match probes for a gene on the array, relative to the intensities of hy-
bridization to single mismatch probes, to further determine whether a signal for a
specific RNA in a sample was reliable (P or present), unreliably low (A or absent),
Search WWH ::




Custom Search