Parallel Computing for Bayesian Networks - Bayesian Networks in R: With Applications in Systems Biology

Biology Reference

In-Depth Information

Furthermore, the computational complexity of the conditional independence tests

and of the network scores themselves must be taken into account; in most cases it

is linear in the sample size. In R , computing conditional probabilities requires one

pass over the data, while computing partial correlations requires two.

As a result, in practice most structure learning algorithms can be applied to data

sets with a few hundred variables at most. Parallel implementations of learning algo-

rithms provide a significant performance boost, thus improving our ability to handle

large networks. However, it is important to note that execution time reduces at most

linearly with the number of slave processes. For this reason, parallelization can-

not be considered a universal solution, even though it may prove useful in many

situations.

5.3.1 Constraint-Based Structure Learning Algorithms

Constraint-based algorithms display a coarse-grained parallelism, because they only

need to synchronize their parts a couple of times. If we examine again Algo-

rithm 2.1 , we can see that:

1. The first step is embarrassingly parallel, because each d-separating set can be

learned independently from the others. Another solution is to split this step in

one part for each node, which will learn all the d-separating sets involving that

particular node. The former approach can take advantage of a greater number of

processors, while the latter has less overhead due to the smaller number of parts

running in parallel.

2. The same holds for the second step. Once all the d-separating sets are known, it

is embarrassingly parallel and can be split in the same way as the first step.

3. The third step is sequential, because each iteration requires the status of the

previous one.

Therefore, the information available to the slave processes has to be synchronized

only collected is between the first and the second step and between the second and

the third step.

Most modern constraint-based algorithms, which learn the Markov blankets of

the nodes as an intermediate step, require one additional synchronization.

For example, if we consider the Grow-Shrink algorithm as shown in Fig. 5.1 ,we

can see that:

1. Each Markov blanket can be computed independently from the others.

2. Each neighborhood is a subset of the corresponding Markov blanket and, there-

fore, can be learned independently from the others. However, the consistency

of the Markov blankets must be checked before learning neighborhoods; due to

errors in the conditional independence tests, they may not be symmetric (see

Sect. 2.3 ). A solution to this problem is to examine all pairs of nodes and remove

them from each other's Markov blanket if they do not appear in both of them.

Search WWH ::

Custom Search

Home