Information Technology Reference
In-Depth Information
When sample data are incomplete, except for some special cases, we need to
borrow approximation method, such as Monte-Carlo method, Gaussian
approximation, EM algorithm to find Maximum Likelihood (ML) or Maximum
A Posteriori (MAP) and so on. Although these algorithms are mature, the
computational cost is large.
6.5.3 Structure of learning Bayesian network
When the structure of a Bayesian network is undetermined, it is possible to learn
both the network structure and the probabilities from data. Because, in data
mining, there are huge amount of data, and it is hard to tell the relation among
variables, the structure-learning problem is practically meaningful.
The network structure that represents the physical joint probability of X is
improvable. According to Bayesian approach, we define a discrete variable to
represent the uncertainty of network structure. The states of the variable
correspond to all possible network structure hypotheses
h . We set its prior as
S
h ). For given random sample D, which comes from the physical distribution of
X , we calculate the posterior probability
p
(
S
h ), where ȶ S is
parameter vector. Then we use these posteriors to calculate interested
expectations.
The computation of
h
p
(
S
|D
) and
p
( ȶ S |
D,S
h ) is similar to that we illustrate in the previous
p
( ȶ S |
D, S
h
section. The computation of
p
(
S
|D
) is theoretically easy. According to Bayesian
theorem, we have:
h |
h ,
h )
h )/
p
(
S
D
)=
p
(
S
D
)/
p
(
D
) =
p
(
S
p
(
D
|
S
p
(
D
)
(6.40)
S h ) is
marginal likelihood. To determine the posterior of network structure, we need
only calculate marginal likelihood for each possible structure.
Under the precondition of unrestricted multinomial distribution, parameter
independence, Dirichlet prior and complete data, the parameter vector ȶ ij can be
updated independently. The marginal likelihood of data is exactly the
multiplication of marginal likelihoods of each i-j pair.
where p
(
D
) is a structure independent normalizing constant, and
p
(
D
|
Γ
(
α
)
Γ
(
α
+
N
)
n
q
r
i
∏∏
=
ij
=
ijk
ijk
h )=
p
(
D
|
S
·
(6.41)
Γ
(
α
+
N
)
Γ
(
α
)
i
1
j
=
1
k
1
ij
ij
ijk
This
formula
is
originally
proposed
by
Cooper
and
Herskovits
in
1992(Cooper,1992).
In common cases, the number of possible Bayesian networks with n variable
is larger than exponential in n. It is intractable to exclude these hypotheses. Two
approaches can be used to handle this problem, namely model selection and
selective model averaging. The former approach is to select a “good” model from
all the possible models (structure hypotheses), and use it as the correct model.
Search WWH ::




Custom Search