Graph Algorithms for Integrated Biological Analysis, with Applications to Type 1 Diabetes Data - Clustering Challenges in Biological Network - page 215

Biology Reference

In-Depth Information

transcript graph and add to it a new vertex for protein p .Wethenusetherank

order provided by the Spearman'scoefficient list to add edges connecting p with

transcript vertices. We add these edges until the subgraph induced by p and its

neighbors contains at least 100 maximal cliques each of size at least 40. We then

output p along with the 60 or fewer vertices that most highly populate the resul-

tant set of cliques. The values 40, 60 and 100 were chosen based on trial and error

combined with our previous experience working with the idiosyncrasies of IPA.

Other values may be superior in other applications.

To test this approach, we chose six proteins on which IPA contained informa-

tion, which were well-expressed in the experimental samples, and which appear to

be orthogonal to each other in terms of their biological function. Two of the six,

HNRPK and EIF4A1, are of special interest because they are generally known

to have increased expression in NOD mice relative to the NON and C57BL/6

strains [14]. The other four are ACTB, GDI2, GNB2L1 and ZBTB1. We also

chose three different transcript graphs constructed from respective Pearson corre-

lation thresholds 0.60, 0.70, and 0.80. For each of these 18 tests, maximal cliques

were highly overlapping, as expected. As a measure of a cluster's biological rel-

evance, we examine a metric we call protein links . Protein links is a count of the

number of connections between an anchored protein and the network created by

IPA. For each protein, we chose the threshold setting that maximizes protein links,

with ties broken in favor of the higher threshold. The lowest threshold, 0.60, had

none of the best results. It is probably the case that, in a graph this dense, the

transcript-transcript relationships drown out protein-transcript correlations.

As a control, we compared the quality of the transcript sets we produced

against the 60 transcripts that simply correlate most highly with the protein. GDI2

and ZBTB1 had fewer than three protein links for all four results (the three thresh-

old values plus the straight correlation list), and so were dropped from further

analysis. Results for each of the four remaining proteins are shown in Table 10.3.

Table 10.3.

Clique vs Correlates

Protein

Algorithm

Probe Sets

Focus Genes

Protein Links

Clique at 0.70

60

42

6

ACTB

Correlates List

60

27

6

Clique at 0.70

59

50

7

EIF4A1

Correlates List

60

41

2

Clique at 0.70

60

39

6

GNB2L1

Correlates List

60

37

3

Clique at 0.80

55

38

5

HNRPK

Correlates List

60

42

3

From this table, we see that our clique-centric approach builds subgraphs that

are no worse and in fact generally better than those simply defined by ranking

Next Page

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home