Probabilistic Reasoning - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

When sample data are incomplete, except for some special cases, we need to

borrow approximation method, such as Monte-Carlo method, Gaussian

approximation, EM algorithm to find Maximum Likelihood (ML) or Maximum

A Posteriori (MAP) and so on. Although these algorithms are mature, the

computational cost is large.

6.5.3 Structure of learning Bayesian network

When the structure of a Bayesian network is undetermined, it is possible to learn

both the network structure and the probabilities from data. Because, in data

mining, there are huge amount of data, and it is hard to tell the relation among

variables, the structure-learning problem is practically meaningful.

The network structure that represents the physical joint probability of X is

improvable. According to Bayesian approach, we define a discrete variable to

represent the uncertainty of network structure. The states of the variable

correspond to all possible network structure hypotheses

h . We set its prior as

h ). For given random sample D, which comes from the physical distribution of

X , we calculate the posterior probability

(

h ), where ȶ S is

parameter vector. Then we use these posteriors to calculate interested

expectations.

The computation of

(

) and

( ȶ S |

D,S

h ) is similar to that we illustrate in the previous

( ȶ S |

D, S

section. The computation of

(

) is theoretically easy. According to Bayesian

theorem, we have:

h |

h ,

h )

h )/

(

) =

(

)

(6.40)

S h ) is

marginal likelihood. To determine the posterior of network structure, we need

only calculate marginal likelihood for each possible structure.

Under the precondition of unrestricted multinomial distribution, parameter

independence, Dirichlet prior and complete data, the parameter vector ȶ ij can be

updated independently. The marginal likelihood of data is exactly the

multiplication of marginal likelihoods of each i-j pair.

where p

(

) is a structure independent normalizing constant, and

(

)

(

)

∏∏

∏ =

ijk

h )=

(

(6.41)

(

)

(

)

ijk

This

formula

originally

proposed

Cooper

and

Herskovits

1992(Cooper,1992).

In common cases, the number of possible Bayesian networks with n variable

is larger than exponential in n. It is intractable to exclude these hypotheses. Two

approaches can be used to handle this problem, namely model selection and

selective model averaging. The former approach is to select a “good” model from

all the possible models (structure hypotheses), and use it as the correct model.

Advanced Artificial Intelligence

Search WWH ::

Custom Search

Home