Biology Reference
In-Depth Information
(b) Fill the nodes with different colors according to their role in the diagnos-
tic process: causes (“visit to Asia” and “smoking”), effects (“tuberculosis,”
“lung cancer,” and “bronchitis”), and the diagnosis proper (“chest X-ray,”
“dyspnea,” and “either tuberculosis or lung cancer/bronchitis”).
(c) Explore different layouts by changing the layout and shape arguments.
(a) > vs = vstructs(bn.eq, arcs = TRUE)
> graphviz.plot(bn.eq, highlight =
+ list(arcs = vs, lwd = 2, col = "grey"))
(b) > graphviz.plot(bn.eq,
+ highlight = list(nodes = nodes(bn),
+ fill = c("blue", "red", "green", "green", "red",
+ "blue", "red", "green"), col = "black"))
(c) > par(mfrow = c(2, 5))
> layout = c("dot", "neato", "twopi", "circo",
"fdp")
> shape = c("ellipse", "circle")
> for (l in layout) {
+
for (s in shape) {
+
main = paste(l, s)
+
graphviz.plot(bn.eq, shape = s, layout = l,
+
main = main)
+}
+}
2.3 Consider the marks data set analyzed in Sect. 2.3 .
(a) Discretize the data using a quantile transform and different numbers of
intervals (say, from 2 to 5 ). How does the network structure learned from
the resulting data sets change as the number of intervals increases?
(b) Repeat the discretization using interval discretization using up to 5 inter-
vals, and compare the resulting networks with the ones obtained previously
with quantile discretization.
(c) Does Hartemink's discretization algorithm perform better than either
quantile or interval discretization? How does its behavior depend on the
number of initial breaks?
(a) As the number of intervals increases, fewer and fewer arcs are included in the
network. This is a consequence of the loss of information resulting from dis-
cretizing variables one at a time, without considering their joint distribution.
> intervals = 2:5
> par(mfrow = c(1, length(intervals)))
> for (int in intervals) {
+
dmarks = discretize(marks, breaks = int,
+
method = "quantile")
+
main = paste("dmarks,", int, "intervals")
Search WWH ::




Custom Search