Parallel Computing for Bayesian Networks - Bayesian Networks in R: With Applications in Systems Biology

Biology Reference

In-Depth Information

We will now illustrate the use of this set of packages, which will then be

used throughout this chapter to show how parallel computing applies to Bayesian

networks.

Thefirststepistoloadthe snow and the rsprng packages.

> library(snow)

> library(rsprng)

The Rmpi and rpvm packages are loaded by snow as needed. Subsequently, we

need to spawn the slave processes and initialize the cluster with the makeCluster

function.

> cl = makeCluster(2, type = "MPI")

Loading required package: Rmpi

The first argument of makeCluster specifies the number of slave processes which

will be spawned, which is usually between 2 and the number of processes that can

run concurrently without overcommitting any hardware resource. The second ar-

gument specifies the communication mechanism used between the master and the

slave processes; possible values are "SOCK" to use sockets (the default), "MPI" to

use Rmpi ,and "PVM" to use rpvm .

Once the slave processes have been spawned, we can initialize their random num-

ber generators.

> clusterSetupSPRNG(cl)

The setup of the cluster is now completed, and we can start using it to speed up our

computations. For example, we can compute simultaneously the means of all the

variables of the marks data we used in Chap. 2 ,

> parApply(cl, X = marks, MARGIN = 2, FUN = mean)

MECH VECT ALG ANL STAT

38.95455 50.59091 50.60227 46.68182 42.30682

getting the same result as the call to mean we would have used to compute them in

a sequential way.

> mean(marks)

MECH VECT ALG ANL STAT

38.95455 50.59091 50.60227 46.68182 42.30682

The parApply function, along with parLapply and parSapply ,represents

the most user-friendly way to set up embarrassingly parallel computations. These

functions are the parallel versions of apply , lapply ,and sapply and work in

exactly the same way from the user's point of view.

Problems which are not embarrassingly parallel, or which cannot be divided in

identical parts, can be tackled using a combination of clusterExport (to copy

thedatatotheslave R processes) and clusterEvalQ (to make the slave processes

execute arbitrary R commands). For instance, we may be interested in comparing

Pearson's and Spearman's correlation matrices for the marks data, and we may

want to estimate these matrices in parallel. To achieve that, we can first export the

marks data to the slave processes,

> clusterExport(cl, list("marks"))

Bayesian Networks in R: With Applications in Systems Biology

Search WWH ::

Custom Search

Home