Biology Reference
In-Depth Information
constructing scale-free priors, in a manner similar to the
scale-free priors others have constructed to integrate
expression and genetic data [48] . Given a transcription
factor T, and a set of genes, G, that contain the binding site
of T, the TF prior, p tf , can be defined so that it is propor-
tional to the number of expression traits correlated with the
TF expression levels, for genes carrying the corresponding
TFBS:
which DNA variation, RNA expression, and metabolite
levels have been assessed [49
51] ; and (2) pro-
e
tein
DNA-binding, protein
protein interaction,
and
e
e
metabolite
protein interaction data available from public
data sources and generated independently of the BXR
cross (referred to here as non-BXR data). The BXR data
are reflected as nodes in the network to be constructed,
where edges in the network reflect statistically inferred
causal relationships among the expression and metabolite
traits. The non-BXR interaction data from public sources
are used to derive the types of structure priors discussed
above on the network to both constrain the size of the
search space in finding the best network and enhance the
ability to infer causal relationships between the network
nodes [26] .
To illustrate the steps in the type of Bayesian network
reconstruction procedure described above and detailed
more formally [51] , and to examine contributions from the
different data types used to construct the network, I focus
on genes and metabolites involved in the de novo
biosynthesis of pyrimidine ribonucleotides ( Figure 26.4 ).
For simplicity I focus on the reconstruction of this smaller
subset of genes, although the steps are similar if building
a network from a more comprehensive set of genes. The
subnetwork depicted in Figure 26.4 a was identified from
the full Bayesian network constructed from the BXR data
[51] . URA3 in this network was predicted as a causal
regulator of gene expression traits linked to the URA3
locus. That is, using the full Bayesian network, in silico
perturbations were carried out by simulating changes in
each of the nodes and identifying those nodes that resulted
in the most significant changes in other nodes in the
network. As a result of this simulation, URA3 was iden-
tified as the regulator modulating the most significant
number of nodes in the subnetwork in a causal fashion
( Figure 26.4 a). A deletion of URA3 was engineered in the
parental strain RM11
e
0
@ X
g i ˛ G
1
A ;
log ð p tf ð T
/
g ÞÞN
log
p tf ð T
/
g i Þd
where p gtl ð T
/
g Þ is the prior for the QTL and
1
;
if corr ð T
;
g i Þ r cutoff
d ¼
;
if corr ð T
;
g i Þ r cutoff
0
The correlation cutoff r cutoff can be determined by
permuting the data and then selecting the maximum
correlation values in the permuted data sets (corresponding
to some predetermined reasonable false discovery rate).
This form of the structure prior favors transcription factors
that have a large number of correlated responding genes.
From the set of priors computed from the inferred and
experimentally determined TFBS set, only non-negative
priors should be used to reconstruct the Bayesian network.
For those protein complexes that could not be integrated
into the network reconstruction process using scale-free
priors, uniform priors were used for pairs of genes in these
complexes (i.e., sp pc ð g i /
g j Þ¼ p pc ð g j /
g i Þ¼ c).
protein interactions can also be incor-
porated into the Bayesian network reconstruction process.
Chemical reactions reflected in biochemical pathways and
the associated catalyzing enzymes can be identified as
metabolite
Small molecule
e
enzyme pairs from existing pathway databases
such as KEGG. These relationships can then be stored in an
adjacency matrix in which a 1 in a cell represents a direct
connection between the metabolite and the enzyme. The
shortest distance d m ; e from an enzyme e to a metabolite m
can then be calculated using the repeated matrix multipli-
cation algorithm. The structure prior for the gene expression
of an enzyme e affecting the metabolite concentration is
related to their shortest distance d m ; e as p ð m
e
1a as a selectable marker, and
segregation of this locus among the BXR progeny is the
most likely cause for expression variation of uracil
biosynthesis genes linked to this locus [50] . Variation of
two metabolites are also linked to this locus: dihydro-
orotic acid, which is converted to orotic acid by the
enzyme Ura1p, and orotic acid itself, reflecting the func-
tional consequence of transcriptional variation in genes
involved in de novo pyrimidine base biosynthetic
processes on metabolite levels. The causal relationships
between URA1, orotic acid, and dihydro-orotic acid as
well as the subnetwork for genes linked to the URA3
locus, recapitulate the known pyrimidine base biosynthesis
pathway [51] . This subnetwork not only captures the co-
regulation of gene expression and metabolite abundance,
but also elucidates the mechanism of how genetic varia-
tion in URA3 affects orotic acid and dihydro-orotic acid
levels.
e
e l d m ; e .
/
e Þf
The shorter the distance, the stronger the prior.
Illustrating the Construction of Predictive
Bayesian Networks with an Example
To illustrate how different types of data can be integrated
to construct predictive gene networks, consider the
following two classes of data: (1) DNA variation, gene
expression, and metabolite data measured in a previously
described cross between laboratory (BY) and wild (RM)
yeast strains (referred to here as the BXR cross) for
Search WWH ::




Custom Search