Biology Reference
In-Depth Information
However, DNA variation in the germline provides an
excellent systematic perturbation source that can also be
used to resolve causal relationships in biological systems.
Because variations in DNA cause variations in RNA,
proteins, metabolites and subsequently higher-order
phenotypes, this source of variation can be leveraged to
infer causality. Unlike artificial perturbations such as gene
knockouts, transgenics, and chemical perturbations that
may induce artificial correlations that are not observed in
more natural settings, naturally occurring genetic variation
defines those perturbations that give rise to the broad array
of phenotypic variations (such as disease and drug
response) that we are precisely interested in elucidating.
The past 7 years have demonstrated that causal links
between DNA variations and molecular and higher-order
phenotypes can provide information on causal relationships
between those traits [3,4,21
34] . Causality in
this instance can be inferred because there is random
segregation of the chromosomes during gametogenesis,
thereby providing the appropriate randomization mecha-
nism to protect against confounding, similar to what is
achieved in randomized clinical trials by randomly
assigning patients to treatments to test the causal effects of
a drug of interest [35,36] . However, quantifying the
uncertainty in making such causal calls has been chal-
lenging. For example, causal effect estimates often
considered in Mendelian randomization approaches can be
confounded by pleiotropic effects and reverse causation,
limiting the utility of such approaches for problems that
involve the reconstruction of regulatory networks, in which
pleiotropy is common and there may be little prior infor-
mation regarding the structure of the causal relationships
between the traits of interest [21] .
Recently, though, formal statistical tests for inferring
causal relationships between quantitative traits mediated by
a common genetic locus have been developed [21] .To
understand how such a test works, consider marker geno-
types at a given DNA locus L that are correlated with
a given molecular phenotype, G, and a higher-order
phenotype T ( Figure 26.3 ). The causal relationship G
e
24,26,30
e
FIGURE 26.3 Given that two traits G and Tare correlated in a given
population with changes in DNA at locus L, there are five basic causal
models to consider in testing the hypothesis that variations in trait G
cause variations in trait T. Here H denotes an unmeasured molecular or
higher-order trait.
coding scheme, then the four conditions above can be tested
in the parameters of the following three regression models:
T i ¼ a 1 þ b 1 L 1i þ b 2 L 2i þ ε 1i
(1.1)
G i ¼ a 2 þ b 3 T i þ b 4 L 2i þ ε 2i
(1.2)
T i ¼ a 3 þ b 6 G i þ b 7 L 1i þ b 8 L 2i þ ε 3i
(1.3)
ε ij represent independently distributed
random noise variables with variance s i [30] . Given these
models, the four component tests of interest are:
H 0 : fb 1 ; b 2 ¼ 0 g;
where the
H 1 : fb 1 ; b 2 g s
0
(1.4)
H 0 : fb 4 ; b 5 ¼ 0 g;
H 1 : fb 4 ; b 5 g s
0
(1.5)
H 0 : b 6 ¼ 0
;
H 1 ¼ b 6 s
0
(1.6)
H 0 : fb 7 ; b 8 s
0 g;
H 1 : fb 7 ; b 8 0
(1.7)
T
is implied if three conditions are satisfied under the
assumption that L is sufficiently randomized: (1) L and G
are associated; (2) L and T are associated; and (3) L is
independent of T given G (T j G) [30] . If L is independent of
G j T this is consistent with T
/
The four conditions of interest can be tested using
standard F tests for linear model coefficients (conditions
1
3) and a slightly more involved test for the last condi-
tion, given that it is an equivalence testing problem [21] .
Given these individual statistical tests on the different
regression parameters, a causal inference test can then be
carried out by testing the strength of the chain of mathe-
matical conditions that collectively are consistent with
causal mediation (i.e., the strength of the chain is only as
strong as its weakest link, so that the intersection of the
rejection regions of the component tests provides for the
causality test we seek). For a series of statistical tests of size
a r and rejection region R r , the 'intersection-union' test with
rejection region is a level X R r sup ða r Þ
e
G, and if L is associated
with G j T then this is consistent with G
/
T. We can boil all
of these observations down to four conditions from which
a statistical test can be formed to test for causality: (1) L
and T are associated; (2) L is associated with G j T; (3) G is
associated with T j L; and (4) L is independent of T j G. Each
of these conditions can be assessed with a corresponding
statistical test. For example, if we assume the marker cor-
responding to locus L is biallelic, where L 1 and L 2 represent
indicator variables for the two alleles in a co-dominant
/
test, so that the
Search WWH ::




Custom Search