Biology Reference
In-Depth Information
from a range of other databases, and is manually curated. Data in BioGrid is labelled
with the PubMed ID of the publication from which it is derived, and is tagged with
annotations, drawn from a controlled vocabulary, indicating the type of experiment
from which it was generated (e.g. “Affinity Capture-mRNA”, “Two-hybrid” and
“Co-localization”). Another useful database is String 14 ( Szklarczyk et al. , 2011 ),
which contains data about protein-protein interactions derived from a variety of
microbial species. Some of the data in STRING are computationally generated,
and may not be as reliable as human-curated data. The Microbial Protein Interaction
database 15 ( Goll et al. , 2008 ) aims to collect and provide all known physical micro-
bial interactions.
Despite the relative dearth of data for some microbial species, interactome analysis
can be valuable even for data generated completely in-house. In particular, the ability
to combine microarray data with other information, such as that about KEGG path-
ways or GO annotations can provide an entirely new perspective on the functional prin-
ciples and dynamics of an entire cellular system ( Tseng et al. ,2012 ). An interesting
recent review on the integration of multiple microbial “omics” datasets is provided
by Zhang and colleagues ( Zhang et al. ,2010b ), while Hallinan and co-workers discuss
both microbial network integration and analysis ( Hallinan et al. ,2011 ).
Interactome analysis has been widely used in microbiology for predicting the
function of un-annotated proteins. An integrated network is constructed, and then
subjected to cluster analysis. A cluster, in a network context, can be defined as “a
group of nodes which are more tightly connected to each other than to the rest of
the network” ( Hallinan et al. , 2009 ). Most of the algorithms used for clustering
non-network data can be adapted for clustering networks. Following clustering,
the biological function of unknown proteins can be predicted using a “guilt-by-
association” approach; proteins which occur in a cluster dominated by proteins with
a single, known function are deemed to be likely to also have that function.
There are many algorithms for using interactomes for inferring the function of co-
clustered genes ( Wang and Marcotte, 2010 ). These are described in more detail in the
Case Study (below).
Integrated networks can be used for a variety of other tasks, such as generating
hypotheses about gene function, analysing gene lists and prioritising lists of genes for
further functional assays. One such application, which has been the subject of con-
siderable interest over the last decade or so, is the identification of network motifs .
Network motifs are sets of small numbers of nodes, usually three to five, connected
in a particular manner. They are assumed to perform a specific function, such as the
amplification or damping of a specific signal, via a feed-forward or feedback loop,
such as that represented by the lac operon in E. coli .
Motifs are believed to afford a mapping between network topology and real bio-
logical dynamics; the hope is that the time course behaviour of a large, complex
14 http://string-db.org/ .
15 http://jcvi.org/mpidb/about.php .
Search WWH ::




Custom Search