Biology Reference
In-Depth Information
5.3.4 Parameter Learning
Parameter learning is another embarrassingly parallel problem. Once the structure
of the network is known, the decomposition of the global distribution into the local
distributions provides a natural way to split the estimation of the parameters among
the slaves. The distribution of each node depends only the values of its parents and
has a limited number of parameters; therefore, the amount of data copied to and
from the slave processes is very small. Furthermore, assigning one variable at a
time to a slave process allows an efficient use of a large number of processors.
Despite all these desirable properties, the parallel estimation of the parameters
does not provide real practical advantages. First, in many “small n ,large p ” settings,
the variables outnumber the observations. In these cases, the overhead of copying
the data to the slaves is greater than the speed boost provided by parallel estimation.
Second, the number of parameters is not homogeneous among the nodes. In many
networks learned from biological data a small number of nodes have a large number
of incoming arcs; typically they correspond to key factors in the experimental set-
ting. Such nodes account for a large number of the parameters of the network; for
example, in discrete data the number of configurations increases rapidly with the
number of parents. This disparity introduces additional inefficiencies in the parallel
execution, because some slaves will require much more time to complete their part
of the estimation.
It is also important to note that parameter estimation is efficient in terms of
computational complexity compared to most other problems concerning Bayesian
networks, both in structure learning and inference. Both discrete and Gaussian
Bayesian networks have closed-form estimators that can be computed in linear time
(in the sample size) for the respective parameters. For this reason, the reduction in
the execution time resulting from a parallel implementation is likely to be negligible
over the whole analysis.
5.4 Applications to Inference Procedures
Inference on Bayesian networks can be performed using a variety of techniques,
some specific to Bayesian networks (see Chap. 4 ), some defined in more general
settings. Exploring applications of parallel computing to such a wide range of tech-
niques would be impossible in the space of this chapter. For this reason, we will con-
centrate only on three common inference techniques: bootstrap , cross-validation ,
and conditional probability queries .
5.4.1 Bootstrap
Bootstrap is a very general tool for investigating probability distributions. It is
also embarrassingly parallel, because bootstrap samples are mutually independent.
Search WWH ::




Custom Search