Parallel Computing for Bayesian Networks - Bayesian Networks in R: With Applications in Systems Biology

Biology Reference

In-Depth Information

5.3.4 Parameter Learning

Parameter learning is another embarrassingly parallel problem. Once the structure

of the network is known, the decomposition of the global distribution into the local

distributions provides a natural way to split the estimation of the parameters among

the slaves. The distribution of each node depends only the values of its parents and

has a limited number of parameters; therefore, the amount of data copied to and

from the slave processes is very small. Furthermore, assigning one variable at a

time to a slave process allows an efficient use of a large number of processors.

Despite all these desirable properties, the parallel estimation of the parameters

does not provide real practical advantages. First, in many “small n ,large p ” settings,

the variables outnumber the observations. In these cases, the overhead of copying

the data to the slaves is greater than the speed boost provided by parallel estimation.

Second, the number of parameters is not homogeneous among the nodes. In many

networks learned from biological data a small number of nodes have a large number

of incoming arcs; typically they correspond to key factors in the experimental set-

ting. Such nodes account for a large number of the parameters of the network; for

example, in discrete data the number of configurations increases rapidly with the

number of parents. This disparity introduces additional inefficiencies in the parallel

execution, because some slaves will require much more time to complete their part

of the estimation.

It is also important to note that parameter estimation is efficient in terms of

computational complexity compared to most other problems concerning Bayesian

networks, both in structure learning and inference. Both discrete and Gaussian

Bayesian networks have closed-form estimators that can be computed in linear time

(in the sample size) for the respective parameters. For this reason, the reduction in

the execution time resulting from a parallel implementation is likely to be negligible

over the whole analysis.

5.4 Applications to Inference Procedures

Inference on Bayesian networks can be performed using a variety of techniques,

some specific to Bayesian networks (see Chap. 4 ), some defined in more general

settings. Exploring applications of parallel computing to such a wide range of tech-

niques would be impossible in the space of this chapter. For this reason, we will con-

centrate only on three common inference techniques: bootstrap , cross-validation ,

and conditional probability queries .

5.4.1 Bootstrap

Bootstrap is a very general tool for investigating probability distributions. It is

also embarrassingly parallel, because bootstrap samples are mutually independent.

Search WWH ::

Custom Search

Home