Information Technology Reference
In-Depth Information
the number of input connections to p
for the
class node, respectively. Like Figure 2.b, Figure 3.b shows the genotype, sentence
and phenotype of an individual belonging to the language generated by G 3212 .
The grammars in Figures 2.a and 3.a can be used to generate BN structures
that solve classification problems with n
=1
for variable nodes, and to c
=2
feature variables. They differ in that
the language generated by G 3212 builds a smaller search space than the generated
by G 3323 , lowering the complexity of EvoBANE's evolutionary process.
=3
2.2 The Fitness Calculator
Every individual is assigned a fitness score when it first enters the fitness calcu-
lator module. This score is a percentage measure that represents how accurately
the BN codified by the individual (derivation tree) classifies a training set of
instances. First, the fitness calculator decodifies the individual to output its BN
structure (phenotype). Then, for each node in the network, a probabilistic es-
timator included in the Weka framework estimates its conditional probability
table (CPT) [17]. Briefly, this estimator estimates the probability of every value
in the node by counting the number of occurrences of that value within the
training set. Once all the CPTs have been estimated, the fitnesscalculatoruses
the BN to classify the instances in the training set and calculates the fitness of
the individual as the percentage of correctly classified instances.
3R su s
EvoBANE was used to classiffy two different datasets that belong to two different
application domains extracted from the UCI repository [20]. The first one is
called “Vote” and classifies voters as “Republicans” or “Democrats” considering
sixteen feature variables. For this dataset, the CFG generator was set to generate
the grammars G 16433 , G 16 3 3 3 ,and G 16 1 3 3 , to tackle the problem from three
different angles. A genetic algorithm (GA) with a repair mechanism [17] was
added as a fourth approach to compare its performance with EvoBANE. These
four approaches were executed 20 times. Each execution is set up to evolve a
population of 10 individuals for 100 generations.
Table 1 shows the descriptive statistics of the results of the four approaches in
both training and test sets. The mean column shows that EvoBANE provides the
best approaches in both the training and testing phases. The standard deviation
column indicates that the GA always gets the same maximum fitness. This fitness
is under the lower bound of the 95% confidence interval for the mean of the three
EvoBANE approaches in both the training and testing phases. One possible
explanation is that the genetic algorithm gets trapped in a local optimum with
a fitness score equal to 92.7586, and is unable to explore the solution space as
EvoBANE does.
An analysis of variance (ANOVA) test was performed in order to statistically
compare the mean fitness of each of the three EvoBANE approaches. Table 2
details the results of the ANOVA test, where the null hypotheses (mean fitness
 
Search WWH ::




Custom Search