Using Cooperative Coevolution for Data Mining of Bayesian Networks - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

6.5 Performance of CCGA

6.5.1 Experimental Methodology

A common practice to assess the performance of a Bayesian network learning

algorithm is to test the algorithm on data sets that are generated from known

network structures by probabilistic logic sampling [6.37]. We follow this prac-

tice and test our algorithm on seven different data sets. All of the data sets are

generated from well-known benchmark Bayesian networks, which include the

ALARM, the ASIA, and the PRINTD networks. Table 6.1 gives a summary

of the data sets that we used in our experiments.

Five of the data sets are generated from the ALARM network obtained

from different sources. Originally, the ALARM network was used in the med-

ical domain for potential anesthesia diagnosis in the operating room [6.38].

Because the network, with 37 nodes and 46 directed edges, has a complex

structure, it is widely used to evaluate the performance of a learning algo-

rithm. The PRINTD network is primarily constructed for troubleshooting

printer problems in the Windows TM operating system [6.3]. It has 26 nodes

and 26 edges. The ASIA-1000 data set is generated from the ASIA net-

work, which is a relatively simple structure that contains eight nodes and

eight edges. The network is also known as the “chest-clinic” network, which

describes a “fictitious medical example whether a patient has tuberculosis,

lung cancer, or bronchitis, related to their X-ray, dyspnea, visit-to-Asia, and

smoking status” [6.39], [6.40]. The data set contains 1000 cases.

In our experiment, we compare the performance of our algorithm with

MDLEP. All algorithms (including the implementation of MDLEP, obtained

from the authors) are implemented in the C++ language and compiled using

the same compiler. 10 ThesameMDLmetricevaluationroutineisusedsothat

the difference among implementations is minimized. For all algorithms, the

maximum size of a parent set is five. Because the algorithms are stochastic in

nature, they are executed 40 times for each test instance. All our experiments

are conducted on Sun Ultra-5 workstations.

10 Weusetheg++ compiler with -O2 optimization level.

Data set

Original network

Size

MDL score of original

network

ALARM-1000

ALARM

1000

18,533.5

ALARM-2000

ALARM

2000

34,287.9

ALARM-5000

ALARM

5000

81,223.4

ALARM-10000

ALARM

10,000

158,497.0

ALARM-O

ALARM

10,000

138,455.0

ASIA-1000

ASIA

1000

3416.9

PRINTD-5000

PRINTD

5000

106,541.6

Table 6.1. Data sets used in the experiments.

Advanced Techniques in Knowledge Discovery and Data Mining

Search WWH ::

Custom Search

Home