Biology Reference
In-Depth Information
p value for the causal inference test corresponds to the p
value for an intersection-union test, or, simply, the
supremum of the four p values for the component tests [30] .
This test has been implemented as the CIT package in the R
statistical programming language and is freely available.
Applications of this type of test can be applied to
resolve the types of causal relationships depicted in
Figure 26.3 . Application of these ideas in segregating
mouse populations have led to the identification and vali-
dation of many genes causal for a number of metabolic
traits, including obesity, diabetes and heart disease. In one
such population constructed between the B6 and DBA
inbred strains of mouse, 111 F2 intercross animals were
placed on a high-fat atherogenic diet for 4 months at 12
months of age. All animals were genotyped using
a genome-wide panel of markers, clinically characterized
with respect to a number of metabolic traits, and the livers
were expression profiled using a comprehensive gene
expression microarray. Given the pattern of genetic asso-
ciation between the metabolic and gene expression traits,
causal inference testing was carried out to identify the
genes in this population best supported as causal of obesity-
related traits [22,34] . Of the top nine genes identified in this
study supported as causative of obesity-related traits, eight
were ultimately experimentally validated [23] . The only
gene that failed to validate was an X-linked gene that was
lethal if completely knocked out, and so represented a more
complicated example that the appropriate tools could not be
constructed to validate.
provide a way to visualize extremely large-scale and
complex relationships among molecular and higher-order
phenotypes such as disease in any given context.
Building from the Bottom Up or Top Down?
Two fundamental approaches to the reconstruction of
molecular networks dominate computational biology
today. The first is referred to as the bottom-up approach, in
which fundamental relationships between small sets of
genes that may comprise a given pathway are established,
thus providing the fundamental building blocks of higher-
order processes that are then constructed from the bottom
up. This approach typically assumes that we have more
complete knowledge regarding the fundamental topology
(connectivity structure) of pathways, and given this
knowledge, models are constructed that precisely detail
how changes to any component of the pathway affect other
components, as well as the known functions carried out by
the pathway (i.e., bottom-up approaches are hypothesis
driven). The second approach is referred to as a top-down
approach in which we take into account all data and our
existing understanding of systems and construct a model
that reflects whole system behavior, and from there tease
apart the fundamental components from the top down. This
approach typically assumes that our understanding of how
the network is actually wired is sufficiently incomplete, that
our knowledge is sufficiently incomplete, that we must
objectively infer the relationships by considering large-
scale high-dimensional data that informs on all relation-
ships of interest (i.e., top-down approaches are data driven).
Given our incomplete understanding of more general
networks and pathways in living systems, this chapter
focuses on a top-down approach to reconstructing predic-
tive networks, given this type of structure learning from
data is critical to derive hypotheses that cannot otherwise
be efficiently proposed in the context of what is known
(from the literature, pathway databases, or other such
sources). However, top-down and bottom-up approaches
are complementary to one another, although these
approaches have largely been pursued as separate disci-
plines, with, interestingly, little cross-talk between them.
One of the future directions discussed in the conclusion is
the need to mathematically unify these two classes of
predictive modeling to produce probabilistic causal
networks that more maximally leverage all available data
and knowledge.
In the context of integrating genetic, molecular profiling
and higher-order phenotypic data, biological networks are
comprised of nodes that represent molecular entities that
are observed to vary in a given population under study (e.g.,
DNA variations, RNA levels, protein states, or metabolite
levels). Edges between the nodes represent relationships
between the molecular entities, and these edges can either
FROM ASSESSING CAUSAL
RELATIONSHIPS AMONG TRAIT PAIRS
TO PREDICTIVE GENE NETWORKS
Leveraging DNA variation as a systematic perturbation
source to resolve the causal relationships among traits is
necessary but not sufficient for understanding the
complexity of living systems. Cells are comprised of many
tens of thousands of proteins, metabolites, RNA, and DNA,
all interacting in complex ways. Complex biological
systems are comprised of many different types of cells
operating within and between many different types of
tissues that make up different organ systems, all of which
interact in complex ways to give rise to a vast array of
phenotypes that manifest themselves in living systems.
Modeling the extent of such relationships between molec-
ular entities, between cells, and between organ systems is
a daunting task. Networks are a convenient framework for
representing the relationships among these different vari-
ables. In the context of biological systems, a network can
be viewed as a graphical model that represents relationships
among DNA, RNA, protein, metabolites, and higher-order
phenotypes such as disease state. In this way, networks
Search WWH ::




Custom Search