Using Cooperative Coevolution for Data Mining of Bayesian Networks - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

Quoting from [6.5]: “In general, two variables X and Y may be conditionally

dependent given a set Z while independent on the subset or superset of Z.”

In the worst case, like in SGS, every possible combination of the conditioning

set should be examined, which would require an exponential number of tests.

Second, results from CI tests may not be reliable for high-order CI tests (when

the size of the conditioning set is high) [6.5], [6.15]. Hence, for algorithms that

require high-order CI tests, the results may be inaccurate. Third, because a

network is constructed in a step-by-step manner, the construction algorithm

may be unstable in the sense that an earlier mistake during construction is

consequential [6.5], [6.16]. Moreover, this suggests that the order of testing

the CI relations is important, which will be a concern when one pursues

optimal performance.

The Search-and-Scoring Approach. The second approach is called the

search-and-scoring approach. Recall that a Bayesian network encodes a joint

probability distribution (Eq. 6.2); we can derive a measure to assess the good-

ness of such encoding. For instance, such a measure could be derived from

Bayesian statistics, information theory, or the minimum description length

(MDL) principle [6.17]. Though their theoretical foundations are different,

some studies [6.18], [6.19] show that different metrics are asymptotically

equivalent under certain conditions.

BecauseweemploytheMDLmetric[6.20]inourwork,wetakeitas

an example for illustration. Basically, the metric is derived from information

theory and incorporates the idea of the minimum description length principle.

The metric has two components: the network description length and the

data description length. An optimal network is the one that minimizes both

simultaneously.

Formally, let U =

be the set of discrete variables, Π N i be

the parent set of a node N i in the candidate network, and v i be the number

of possible states of the variable N i . The network description length is given

by

{

N 1 ,...,N n }

⎡

⎤

n

1)

N j ∈Π N i

⎣ |

⎦ ,

Π N i |

log

(n)+d(v i −

v j

2

i=1

where d is a constant denoting the number of bits used to store a numerical

value. Intuitively, the network description length represents the structural

complexity of the network, which is evaluated by the number of bits required

to encode the graphical structure and store the conditional probability table

at each node.

The data description length is given by

n

M(Π N i )

M(N i ,Π N i ) ,

M(N i ,Π N i )log 2

i=1

N i ,Π N i

Advanced Techniques in Knowledge Discovery and Data Mining

Search WWH ::

Custom Search

Home