Parallel Computing for Bayesian Networks - Bayesian Networks in R: With Applications in Systems Biology

Biology Reference

In-Depth Information

user system elapsed

0.440 0.036 13.007

> stopCluster(cl)

In fact, cross-validation can be used to evaluate any combination of structure

learning algorithms, parameter learning methods, and the respective tuning param-

eters. It can also be used to evaluate a predetermined network structure; in this case,

X ∗ − 1 ,...,

X ∗ − k are used only for parameter learning. Consider, for instance, the naive

Bayes classifier ( Borgelt et al. , 2009 ), which is equivalent to a star-shaped network

with the training variable at the center and all the arcs pointing to the training vari-

able.

> naive = naive.bayes(training = "CompPlFcst",

+ data = hailfinder)

> bn.cv(hailfinder, naive, loss = "pred")

k-fold cross-validation for Bayesian networks

target network structure:

[Naive Bayes Classifier]

number of subsets:

10

loss function:

Classification Error

training node: CompPlFcst

expected loss: 0

As expected, the classification error is considerably lower than with hc or tabu .

Naive Bayes is, despite its simple structure and strong assumptions, one of the most

efficient and effective algorithms in data mining and classification ( Zhang , 2004 ).

The performance gain from the use of a snow cluster is not as marked as in the

previous example (the execution time halves with two slaves, but does not improve

beyond that). This difference in behavior suggests that most of the execution time

in the previous example was spent learning the structure of the network and that, as

anticipated in Sect. 5.3.4 , parameter learning is relatively fast in comparison.

5.4.3 Conditional Probability Queries

Conditional probability queries are the most common form of Bayesian network

inference; as a result, parallel implementations of the exact and approximate algo-

rithms covered in Sect. 4.1.2 have been investigated in literature. Particle filters algo-

rithms, in particular, exhibit coarse-grained parallelism if particles are generated us-

ing Markov chain Monte Carlo approaches or are embarrassingly parallel if particles

are independent. Logic sampling, illustrated in Algorithm 4.2 , falls in the second

category.

Consider, for example, how the knowledge that there is a weather instability in

the mountains (i.e., InsInMt == "Strong" ) and that there is a marked cloud

Search WWH ::

Custom Search

Home