Using Cooperative Coevolution for Data Mining of Bayesian Networks - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

is presumed for a given distribution P [6.5]. In other words, it is assumed that

there exists a Bayesian network, G, that captures all the conditional indepen-

dence relations implied by P . Consequently, this suggests a general learning

methodology: Construct a network G by testing the validity of any inde-

pendence assertions I(X, Z, Y ). In practice, we can use what is collectively

called the conditional independence test (CI test). If the statement I(X, Z, Y )

is supported by the data, it follows that X should be d-separated [6.10] with

Y by Z in G; otherwise, X is not d-separated with Y by Z.

As a digression from the ongoing discussion, we give a brief description

of the CI test. A common approach is to use a hypothesis testing proce-

dure discussed in the statistical literature [6.5], [6.12], [6.13]. To begin, the

conditional independence assertion (i.e., I(X, Z, Y )) is modeled as the null

hypothesis. Suppose we use the likelihood-ratio χ 2

test; the χ 2

statistics is

calculated by:

2 observed

G 2 =

−

∗

log(expected/observed).

(6.3)

Simply put, the statistic calculate the discrepancies between the real oc-

currence, observed, and the expected count followed from the hypothesis,

expected, over every distinct event. In our case, because I(X, Z, Y ) implies

P (x, y, z)=P (x

|

y, z) P (y, z)

= P (x

|

y) P (y, z)

( y . . ),

the statistics are computed by:

2

x,y,z

P (x, y, z)

P (y, z)P (x

G 2 =

−

P (x, y, z)log

z) .

(6.4)

|

Suppose the number of possible instantiations of the variables X, Y ,andZ

are, respectively v X , v Y ,andv Z ; G 2

follows a χ 2

distribution with (v X −

v Z degrees of freedom. Checking our computed G 2 against the

distribution, we obtain the p-value, which is “the smallest level of significance

for which the data lead to the rejection of the null hypothesis” [6.14]. If the

p-value is less than a predefined cutoff value α, the test shows strong evidence

to reject the hypothesis; otherwise, the hypothesis cannot be rejected.

Take the SGS algorithm [6.5] as an illustration. The algorithm begins

with a completely connected undirected graph. In other words, dependence

between every pair of variables is assumed. Then CI tests between all pairs

of connected nodes are conducted. When two nodes X and Y are found to

be conditionally independent given Z, the undirected edge between them is

removed so that I(X, Z, Y ) is not violated. When no more edges can be re-

moved, the undirected edges in the graph are oriented according to some rules

that conform to the conditional independence relations discovered previously.

This produces the final Bayesian network.

In general, there are three problems typical to the dependency analysis

approach. First, it is di cult to determine whether two nodes are dependent.

1)

×

(v Y −

1)

×

Advanced Techniques in Knowledge Discovery and Data Mining

Search WWH ::

Custom Search

Home