Parallel Computing for Bayesian Networks - Bayesian Networks in R: With Applications in Systems Biology - page 115

Biology Reference

In-Depth Information

1000

800

600

400

1

2

3

4

5

6

number of slaves

Fig. 5.5 Performance of bootstrap resampling for different numbers of slave processes, measured

by its execution time (in seconds)

of the 95 % confidence interval are very close to the mean value. This is a conse-

quence of the large sample size of hailfinder (20

,

000 observations) compared

,

to the number of parameters of the network (1

768) learned by hc .

It is easy to show that the embarrassingly parallel nature of bootstrap resampling

results in substantial performance improvements:

> system.time(bn.boot(hailfinder, algorithm = "hc",

+ R = 200, statistic = narcs))

user system elapsed

1103.585 1.216 1104.848

> cl = makeCluster(2, type = "MPI")

> system.time(bn.boot(hailfinder, algorithm = "hc",

+ R = 200, statistic = narcs, cluster = cl))

user system elapsed

0.292 0.040 586.009

> stopCluster(cl)

Adding more slaves further reduces the execution time, at least up to a cluster of 6

processes (see Fig. 5.5 ). Using a larger number of slave processes does not result in

additional speedups, at least for this number of bootstrap samples.

5.4.2 Cross-Validation

Cross-validation is probably the simplest and most widely used method to validate

statistical models and to select suitable values for their tuning parameters. It has

also been applied to many classes of models, from regression to classification, to

estimate loss functions (such as classification error or likelihood loss ) for model

selection. Several examples of such applications are covered in Hastie et al. ( 2009 ).

Similarly to the bootstrap, cross-validation is embarrassingly parallel. Once the

data have been partitioned in k parts and the k cross-validation samples X ∗ − 1 ,...,

X ∗ − k

Next Page

Bayesian Networks in R: With Applications in Systems Biology

Search WWH ::

Custom Search

Home