Parallel Computing for Bayesian Networks - Bayesian Networks in R: With Applications in Systems Biology - page 108

Biology Reference

In-Depth Information

Fig. 5.1 Parallel implementation of the Grow-Shrink algorithm present in bnlearn

3. Given the Markov blankets and the neighborhoods, the v-structures centered on

a particular node (i.e., the one with the converging arcs) can again be identified

in parallel. As in the previous step, the consistency of the neighborhoods must be

checked and any departure from symmetry must be fixed beforehand.

Furthermore, the final step of the Grow-Shrink algorithm, in which the directions

of compelled arcs are learned, also displays a fine-grained parallelism. The order

in which arcs are considered in that step depends on the topology of the graph;

undirected arcs whose orientations would result in the greatest number of cycles are

considered first. That number can be computed in parallel for each arc, at the cost

of introducing some overhead.

We will now examine the practical implications of parallelizing a constraint-

based learning algorithm. To that end, we will use the hailfinder data set in-

cluded in bnlearn , which is generated from the reference network of the same name.

Hailfinder is a Bayesian network designed by Abramson et al. ( 1996 ) to forecast

severe summer hail in northeastern Colorado. It contains 56 variables and 20,000

observations and is large enough to properly highlight the advantages and the limi-

tations of parallel computing.

Consider a simple cluster with two slave processes.

> data(hailfinder)

> cl = makeCluster(2, type = "MPI")

2 slaves are spawned successfully. 0 failed.

> res = gs(hailfinder, cluster = cl)

> unlist(clusterEvalQ(cl, .test.counter))

[1] 2698 3765

> .test.counter

[1] 4

> stopCluster(cl)

Next Page

Bayesian Networks in R: With Applications in Systems Biology

Search WWH ::

Custom Search

Home