Biology Reference
In-Depth Information
Group A
Group B
With Latent Grouping
ANL
ANL
ANL
ALG
STAT
VECT
LAT
MECH
MECH
VECT
ALG
MECH
STAT
VECT
STAT
ALG
bn.A
bn.B
bn.LAT
Fig. 2.7 Bayesian networks learned from each group of students ( left and center )andthenetwork
learned from the whole discretized data set after the inclusion of the latent variable LAT
These assumptions are difficult to verify in real-world settings, as the set of the
potential confounding factors is not usually known. At best we can address this
issue, along with selection bias, by implementing a carefully planned experimental
design.
Furthermore, even when dealing with interventional data collected from a
controlled experiment (where we can set the value of at least some variables and
observe the resulting changes), there are usually multiple equivalent network struc-
tures that represent reasonable causal models. Many arcs may not have a definite
direction, resulting in substantially different networks. When the sample size is
small, there may also be several non-equivalent networks fitting the data equally
well. Therefore, in general we are not able to identify a single, “best,” causal net-
work but rather a small set of likely causal networks that fit our knowledge of the
data.
An example of the bias introduced by the presence of a latent variable was illus-
trated by Edwards ( 2000 )usingthe marks data. He noted that if we assume that
the students belong to two different groups (which we will call A and B ) and assign
each student to one of them using the EM algorithm ( MacLachlan and Krishnan ,
2008 ), each group identifies a different set of relationships between the five topics.
> latent = factor(c(rep("A", 44), "B",
+ rep("A", 7), rep("B", 36)))
> bn.A = hc(marks[latent == "A", ])
> bn.B = hc(marks[latent == "B", ])
> modelstring(bn.A)
[1] "[MECH][ALG|MECH][VECT|ALG][ANL|ALG]
[STAT|ALG:ANL]"
> modelstring(bn.B)
[1] "[MECH][ALG][ANL][STAT][VECT|MECH]"
 
Search WWH ::




Custom Search