Biology Reference
In-Depth Information
son correlations were employed. We sought to generate 500 clusters, because that
should yield clusters of roughly the same size as those produced by the paraclique
algorithm. Iteration was performed until convergence. IPA requires that each net-
work be analyzed separately (no batch mode is available), a process that can be
quite time consuming. Thus, only a small number, say ten, of clusters could be se-
lected for further analysis. For paraclique, we simply selected the first ten outputs.
Deciding on a representative set of K-means outputs was not as straightforward.
We therefore chose to select K-means clusters under three different criteria. One
criterion was to choose the ten largest clusters. Another was to favor those ten
with the highest edge density in the p=0.01 graph. In case this produced unfairly
small genesets, we also required that for a cluster to be selected it had to have
size at least 50, the same lower bound we use for paraclique. The third criterion
was based on paraclique overlap. For this we chose the ten K-means clusters with
the highest percentage overlap with some paraclique, again insisting that a cluster
had to have at least 50 vertices. Overlap ranged from roughly 45% to 64%, with
an average of about 55%. Table 10.2 summarizes these results.
All values are
averaged over the relevant ten clusters.
Table 10.2.
Paraclique versus K-Means
Probe
Edge
Focus
Genes
Percent
Focus Genes
Method
Sets
Density
Genes
Utilized
Utilized
per Network
Paraclique
254.3
97.1%
146.9
140.7
95.5%
14.4
Large K-means
244.0
31.5%
143.1
133.5
93.0%
12.8
Dense K-means
80.9
84.6%
52.3
46.7
89.4%
12.8
Overlap K-means
89.0
79.6%
55.7
49.9
89.8%
12.3
By inspection, paraclique is superior to K-means clustering in terms of density.
The case for superior biological relevance is perhaps less obvious. We therefore
performed ANOVA tests for statistical significance. The number of focus genes
per network was higher (p <. 001) for paraclique than for any of the K-means
methods. And while paraclique did not differ markedly from Large K-means in
terms of cluster size, it was more successful than other K-means methods in both
size and percent focus genes utilized (p <. 05).
10.6. Proteomic Data Integration
We now consider the problem of combining quantitative transcript and protein
data for analysis. Only a few studies have been reported (see, for example, [2]).
The related problem of combining gene expression with measures of function
was recently considered in [3].
There gene ontology, phenotypes and protein-
Search WWH ::




Custom Search