Biology Reference
In-Depth Information
It may also be instructive to compare IPA's outputs visually. Fig. 10.5 and 10.6
contain screenshots of merged network diagrams created by IPA for HNRPK.
Fig. 10.5 was generated from the list of transcripts produced by our clique-centric
method; Fig. 10.6 was generated from the list produced by mere correlate ranking.
Focus genes are depicted in grey. Connections to the anchor protein are rendered
in blue. Glyph shapes vary depending on IPA classifiers.
The IPA screenshots shown in Fig. 10.5 and 10.6 demonstrate how the two
methods we consider create quite different networks, and how the protein is con-
nected to more genes in the network created by the clique-centric algorithm.
10.7. Remarks
We have studied clique-centric algorithms in the context of effective biological
data clustering. Statistical quality based primarily on edge density and biologi-
cal significance based on curated pathway matching have demonstrated the utility
of paraclique and related methods. We have also considered the problem of inho-
mogeneous data integration. Transcriptomic data from gene expression arrays and
proteomics data from 2d gels have been reconciled to identify biological networks
for further scrutiny.
We emphasize that this work has been limited in scope to the analysis of in-
homogeneous data of relevance to type 1 diabetes. It is not meant to provide a
comprehensive guide to the literature. Nor is it intended to serve as an exhaustive
comparison of clustering methods. Such a task would be an enormous challenge,
requiring the implementation of a huge number of algorithms, and necessitating
tests across a great many diverse datasets.
There are a variety of ways to modify and enhance paraclique and the other
algorithms we describe. In [6], for example, an optional user-defined threshold
parameter is provided to help guide the search for edges affected by noise. For
simplicity, we have ignored this parameter here and considered only the effect
of the glom factor. Another enhancement is to glom vertices in stages, invoking
paraclique iteratively until a certain threshold is reached. Initial results suggest
that this procedure can further increase paraclique size while maintaining both
edge density and biological fitness as measured by IPA.
Finally, we observe that pleiotropism is common in gene and gene products. It
is thus a major reason for the popularity of soft clustering methods such as clique:
a vertex can lie in more than one clique, just as an oligonucleotide or a protein can
lie in more than one pathway. Noise and the need for simpler structures motivate
the paraclique algorithm. The clusters produced are robust with respect to a few
missing edges. Unfortunately, they no longer overlap with the basic paraclique
Search WWH ::




Custom Search