Biology Reference
In-Depth Information
transcript graph and add to it a new vertex for protein p .Wethenusetherank
order provided by the Spearman'scoefficient list to add edges connecting p with
transcript vertices. We add these edges until the subgraph induced by p and its
neighbors contains at least 100 maximal cliques each of size at least 40. We then
output p along with the 60 or fewer vertices that most highly populate the resul-
tant set of cliques. The values 40, 60 and 100 were chosen based on trial and error
combined with our previous experience working with the idiosyncrasies of IPA.
Other values may be superior in other applications.
To test this approach, we chose six proteins on which IPA contained informa-
tion, which were well-expressed in the experimental samples, and which appear to
be orthogonal to each other in terms of their biological function. Two of the six,
HNRPK and EIF4A1, are of special interest because they are generally known
to have increased expression in NOD mice relative to the NON and C57BL/6
strains [14]. The other four are ACTB, GDI2, GNB2L1 and ZBTB1. We also
chose three different transcript graphs constructed from respective Pearson corre-
lation thresholds 0.60, 0.70, and 0.80. For each of these 18 tests, maximal cliques
were highly overlapping, as expected. As a measure of a cluster's biological rel-
evance, we examine a metric we call protein links . Protein links is a count of the
number of connections between an anchored protein and the network created by
IPA. For each protein, we chose the threshold setting that maximizes protein links,
with ties broken in favor of the higher threshold. The lowest threshold, 0.60, had
none of the best results. It is probably the case that, in a graph this dense, the
transcript-transcript relationships drown out protein-transcript correlations.
As a control, we compared the quality of the transcript sets we produced
against the 60 transcripts that simply correlate most highly with the protein. GDI2
and ZBTB1 had fewer than three protein links for all four results (the three thresh-
old values plus the straight correlation list), and so were dropped from further
analysis. Results for each of the four remaining proteins are shown in Table 10.3.
Table 10.3.
Clique vs Correlates
Protein
Algorithm
Probe Sets
Focus Genes
Protein Links
Clique at 0.70
60
42
6
ACTB
Correlates List
60
27
6
Clique at 0.70
59
50
7
EIF4A1
Correlates List
60
41
2
Clique at 0.70
60
39
6
GNB2L1
Correlates List
60
37
3
Clique at 0.80
55
38
5
HNRPK
Correlates List
60
42
3
From this table, we see that our clique-centric approach builds subgraphs that
are no worse and in fact generally better than those simply defined by ranking
Search WWH ::




Custom Search