Grammar-Guided Evolutionary Construction of Bayesian Networks - Foundations on Natural and Artificial Computation

Information Technology Reference

In-Depth Information

the number of input connections to p

for the

class node, respectively. Like Figure 2.b, Figure 3.b shows the genotype, sentence

and phenotype of an individual belonging to the language generated by G 3212 .

The grammars in Figures 2.a and 3.a can be used to generate BN structures

that solve classification problems with n

=1

for variable nodes, and to c

=2

feature variables. They differ in that

the language generated by G 3212 builds a smaller search space than the generated

by G 3323 , lowering the complexity of EvoBANE's evolutionary process.

=3

2.2 The Fitness Calculator

Every individual is assigned a fitness score when it first enters the fitness calcu-

lator module. This score is a percentage measure that represents how accurately

the BN codified by the individual (derivation tree) classifies a training set of

instances. First, the fitness calculator decodifies the individual to output its BN

structure (phenotype). Then, for each node in the network, a probabilistic es-

timator included in the Weka framework estimates its conditional probability

table (CPT) [17]. Briefly, this estimator estimates the probability of every value

in the node by counting the number of occurrences of that value within the

training set. Once all the CPTs have been estimated, the fitnesscalculatoruses

the BN to classify the instances in the training set and calculates the fitness of

the individual as the percentage of correctly classified instances.

3R su s

EvoBANE was used to classiffy two different datasets that belong to two different

application domains extracted from the UCI repository [20]. The first one is

called “Vote” and classifies voters as “Republicans” or “Democrats” considering

sixteen feature variables. For this dataset, the CFG generator was set to generate

the grammars G 16433 , G 16 3 3 3 ,and G 16 1 3 3 , to tackle the problem from three

different angles. A genetic algorithm (GA) with a repair mechanism [17] was

added as a fourth approach to compare its performance with EvoBANE. These

four approaches were executed 20 times. Each execution is set up to evolve a

population of 10 individuals for 100 generations.

Table 1 shows the descriptive statistics of the results of the four approaches in

both training and test sets. The mean column shows that EvoBANE provides the

best approaches in both the training and testing phases. The standard deviation

column indicates that the GA always gets the same maximum fitness. This fitness

is under the lower bound of the 95% confidence interval for the mean of the three

EvoBANE approaches in both the training and testing phases. One possible

explanation is that the genetic algorithm gets trapped in a local optimum with

a fitness score equal to 92.7586, and is unable to explore the solution space as

EvoBANE does.

An analysis of variance (ANOVA) test was performed in order to statistically

compare the mean fitness of each of the three EvoBANE approaches. Table 2

details the results of the ANOVA test, where the null hypotheses (mean fitness

Foundations on Natural and Artificial Computation

Search WWH ::

Custom Search

Home