Database Reference
In-Depth Information
6.5 Performance of CCGA
6.5.1 Experimental Methodology
A common practice to assess the performance of a Bayesian network learning
algorithm is to test the algorithm on data sets that are generated from known
network structures by probabilistic logic sampling [6.37]. We follow this prac-
tice and test our algorithm on seven different data sets. All of the data sets are
generated from well-known benchmark Bayesian networks, which include the
ALARM, the ASIA, and the PRINTD networks. Table 6.1 gives a summary
of the data sets that we used in our experiments.
Five of the data sets are generated from the ALARM network obtained
from different sources. Originally, the ALARM network was used in the med-
ical domain for potential anesthesia diagnosis in the operating room [6.38].
Because the network, with 37 nodes and 46 directed edges, has a complex
structure, it is widely used to evaluate the performance of a learning algo-
rithm. The PRINTD network is primarily constructed for troubleshooting
printer problems in the Windows TM operating system [6.3]. It has 26 nodes
and 26 edges. The ASIA-1000 data set is generated from the ASIA net-
work, which is a relatively simple structure that contains eight nodes and
eight edges. The network is also known as the “chest-clinic” network, which
describes a “fictitious medical example whether a patient has tuberculosis,
lung cancer, or bronchitis, related to their X-ray, dyspnea, visit-to-Asia, and
smoking status” [6.39], [6.40]. The data set contains 1000 cases.
In our experiment, we compare the performance of our algorithm with
MDLEP. All algorithms (including the implementation of MDLEP, obtained
from the authors) are implemented in the C++ language and compiled using
the same compiler. 10 ThesameMDLmetricevaluationroutineisusedsothat
the difference among implementations is minimized. For all algorithms, the
maximum size of a parent set is five. Because the algorithms are stochastic in
nature, they are executed 40 times for each test instance. All our experiments
are conducted on Sun Ultra-5 workstations.
10 Weusetheg++ compiler with -O2 optimization level.
Data set
Original network
Size
MDL score of original
network
ALARM-1000
ALARM
1000
18,533.5
ALARM-2000
ALARM
2000
34,287.9
ALARM-5000
ALARM
5000
81,223.4
ALARM-10000
ALARM
10,000
158,497.0
ALARM-O
ALARM
10,000
138,455.0
ASIA-1000
ASIA
1000
3416.9
PRINTD-5000
PRINTD
5000
106,541.6
Table 6.1. Data sets used in the experiments.
Search WWH ::




Custom Search