Biology Reference
In-Depth Information
1000
800
600
400
1
2
3
4
5
6
number of slaves
Fig. 5.5 Performance of bootstrap resampling for different numbers of slave processes, measured
by its execution time (in seconds)
of the 95 % confidence interval are very close to the mean value. This is a conse-
quence of the large sample size of hailfinder (20
,
000 observations) compared
,
to the number of parameters of the network (1
768) learned by hc .
It is easy to show that the embarrassingly parallel nature of bootstrap resampling
results in substantial performance improvements:
> system.time(bn.boot(hailfinder, algorithm = "hc",
+ R = 200, statistic = narcs))
user system elapsed
1103.585 1.216 1104.848
> cl = makeCluster(2, type = "MPI")
> system.time(bn.boot(hailfinder, algorithm = "hc",
+ R = 200, statistic = narcs, cluster = cl))
user system elapsed
0.292 0.040 586.009
> stopCluster(cl)
Adding more slaves further reduces the execution time, at least up to a cluster of 6
processes (see Fig. 5.5 ). Using a larger number of slave processes does not result in
additional speedups, at least for this number of bootstrap samples.
5.4.2 Cross-Validation
Cross-validation is probably the simplest and most widely used method to validate
statistical models and to select suitable values for their tuning parameters. It has
also been applied to many classes of models, from regression to classification, to
estimate loss functions (such as classification error or likelihood loss ) for model
selection. Several examples of such applications are covered in Hastie et al. ( 2009 ).
Similarly to the bootstrap, cross-validation is embarrassingly parallel. Once the
data have been partitioned in k parts and the k cross-validation samples X 1 ,...,
X k
 
Search WWH ::




Custom Search