Biology Reference
In-Depth Information
protein interaction were used to devise distance measures and permutation tests
for strength of commonality in graphs from these different data sources. Although
no quantitative protein values were employed, data derived from Saccharomyces
cerevisiae , commonly known as baker's or budding yeast, suggested that similar-
ity in expression is related to similarity in function.
Our main goal is to identify biological pathways, each of which is anchored
by a protein of interest. We are fortunate that both gene expression array data and
protein gel data were collected from the exact same samples. If it were not for the
expense involved, we would wonder why this is not done more often. Neverthe-
less, data integration remains a formidable task. The biggest difficulty we must
overcome is probably that transcriptomic and proteomic data are generated by
two completely different and unrelated processes. Thus we will not be able to use
parametric statistical procedures, including the highly favored Pearson's correla-
tion technique. Another problem is that current technologies for protein sensing
are generally inferior to those for transcript detection. Modern expression array
platforms can often detect transcripts for more than 50% of the known genes in the
relevant organism, and generate highly reproducible quantitative measurements.
In contrast, protein identification platforms can seldom cover more than 10% of
an organism's estimated number of proteins, and with only moderate quantization
and reproducibility. Of course function is a direct consequence of proteins, not
mRNA, and so the importance of protein expression cannot be underestimated.
Finally, it is well known that gene expression at the mRNA level will not always
correlate well with gene expression at the protein level. After all, gene products
are subject to post-transcriptional and post-translational modifications, degrada-
tion and other factors. Put together, these difficulties make any serious attempt
at transcript-protein co-expression analysis a huge challenge. In the sequel, we
shall address this challenge with non-parametric methods, graph algorithms and a
clique-centric combinatorial approach.
We begin with the establishment of two correlation structures. For transcript-
transcript relationships, we retain the Pearson'scoefficients already computed.
Transcript-protein relationships are typically much weaker and, for reasons al-
ready stated, require a non-parametric approach. For these we employ the rank
metric provided by Spearman's correlation technique. This naturally leads to the
loss of some information; a simple ranked list “flattens” raw data values. Our aim
is now two-fold. We still wish to find dense, well-connected subgraphs. Yet these
subgraphs must also be anchored as much as possible about some given protein,
p , under scrutiny. Of course we could simply choose a putative pathway to be
p and those transcripts ranked most highly with it. As we shall show, however,
we can do better with the use of graph structure. To accomplish this, we take the
Search WWH ::




Custom Search