Biology Reference
In-Depth Information
(b) Use bootstrap resampling to evaluate the distribution of the number of arcs
present in each of the networks learned in the previous point. Do they differ
significantly?
(c) Compute the averaged network structure for sachs using hill-climbing
with BGe and different imaginary sample sizes. How does the value of the
significance threshold change as iss increases?
(a) The network learned with BGe appears to fit the data better than the one
fitted with BIC, but not by a wide margin. Therefore, we need to repeat
cross-validation for a suitable number of times to conclude the difference is
significant.
> sachs = read.table("sachs.data.txt",
+ header = TRUE)
> bn.bic = hc(sachs, score = "bic-g")
> bn.cv(bn.bic, data = sachs)
> bn.bge = hc(sachs, score = "bge")
> bn.cv(bn.bge, data = sachs)
(b) The distributions of the number of arcs for BIC and BGe present important
differences. First, the latter is bell-shaped, while the former is markedly asym-
metric. Second, the mean and the standard deviations of the two distributions
are different (the exact values depend on the bootstrap samples, so they change
at each new simulation).
> narcs.bic =
+ bn.boot(sachs, algorithm = "hc",
+ algorithm.args = list(score = "bic-g"),
+ statistic = narcs)
> narcs.bge =
+ bn.boot(sachs, algorithm = "hc",
+ algorithm.args = list(score = "bge"),
+ statistic = narcs)
> narcs.bic = unlist(narcs.bic)
> narcs.bge = unlist(narcs.bge)
> par(mfrow = c(1, 2))
> hist(narcs.bic, main = "BIC", freq = FALSE)
> curve(dnorm(x, mean = mean(narcs.bic),
+ sd = sd(narcs.bic)), add = TRUE, col = 2)
> hist(narcs.bge, main = "BGe", freq = FALSE)
> curve(dnorm(x, mean = mean(narcs.bge),
+ sd = sd(narcs.bge)), add = TRUE, col = 2)
(c) > t = numeric(5)
> iss = c(5, 10, 20, 50, 100)
> for (i in seq_along(iss)) {
+
s = boot.strength(sachs, algorithm = "hc",
+
algorithm.args = list(score = "bge",
Search WWH ::




Custom Search