Using Cooperative Coevolution for Data Mining of Bayesian Networks - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

A

0 0 0 0

1 0 0 0

node B's parent set

C

B

0 0 0 0

0 1 1 0

D

Fig. 6.4. Matrix representation of a DAG.

A traditional GA (with one-point crossover and mutation) is applied to search

for the optimal solution represented in the bit-string representation. For

the fitness function, they adopted the Bayesian score (called the BD score

in [6.21]), which was used in the K2 algorithm [6.22]. Note that because the

genetic operators can generate illegal offspring structures (i.e., networks that

are not DAG), cycle repairing is needed after an offspring is produced. 3

Because it is rare to have a densely connected network in real-world prob-

lems, they imposed a limit on the number of parents a node could have in

its implementation. 4 Even though such a restriction greatly reduces the pos-

sible search space, the problem is still NP-hard, as suggested by the work of

Hoffgen [6.32].

They conducted a number of experiments to test the GA approach with

different implementations under different parameter settings. Based on the

results, several recommendations regarding the choice of implementation and

parameters are made.

6.3.2 Using EP

Recently, Wong et al. [6.9] used evolutionary programming to tackle the

learning problem. Because they used the minimum description length met-

ric [6.20] to evaluate fitness, they called their approach MDLEP.

EP is different from GAs mainly in the format of solution representation

and the genetic operators used [6.33], [6.26]. Unlike the restricted use of

string in GAs, EP does not have any restriction in solution representation.

An individual in MDLEP is simply the connectivity matrix of the network.

3 In a later work, they considered the case that a node ordering is given.With a

node ordering, the genetic operators become closed operators, which means no

repairing is required.

4 The same limit is imposed in MDLEP and all of our algorithms.

Search WWH ::

Custom Search

Home