Information Technology Reference
In-Depth Information
the light of the available data. By Bayes' rule p ( c k |M
,
D
)
p (
D|
c k ,
M
) p ( c k |M
),
where the classifier model evidence p ( c k |M
,
D
), is similarly to (7.2) given by
)=
p (
D|
c k ,
M
p (
D|
θ k ,c k ,
M
) p ( θ k |
c k ,
M
)d θ k ,
(8.14)
and matching needs to be taken into account when formulating p (
D|
θ k ,c k ,
M
).
As for good model structures, good classifiers are classifiers for which p ( c k |M
,
D
)
is large, or equivalently, for which p (
D|
c k ,
M
) is large, given uniform classifier
priors p ( c k |M
). Therefore, the mutation operatoroftheGAcanbebiasedto-
wards changing bad classifiers, and genetic operators can construct new indivi-
duals from good classifiers of other individuals, or prune individuals by removing
bad classifiers.
From the GA side itself, a more principled approach might be sought from
evolutionary computation techniques where variable-length individuals are com-
mon. Genetic programming (GP) is one of them, as the program that each indi-
vidual represents is not fixed in size. However, while the fitness of a program does
not necessarily decrease with its complexity, Bayesian model selection penalises
overly complex model structures. Thus, GP suffers from the problem of indivi-
duals growing out of bound that is naturally avoided in the approach presented
here. Nonetheless, some of the theoretical results of GP might still be applicable
to improving the GA to search the model structure space.
Another route that can additionally improve the performance of the GA is
to use Estimation of Distribution Algorithms (EDAs) [183] that improve the
crossover operator by detecting and preserving building blocks. They do so by
creating a statistical model of the high-fitness individuals of the current popu-
lation and draw samples from this model to create new individuals, rather than
using standard crossover operators. Good building blocks are expected to appear
in several individuals and consequently receive additional support in the model.
The Bayesian Optimization Algorithm (BOA), for example, models the alleles of
selected individuals by a Bayesian network that is samples thereafter to produce
new individuals [182].
Currently, there exists no EDA that can handle variable-length individuals
adequately [150]. The problem is that the space of possible classifiers, that is, the
space of possible matching function parametrisations, is too large to frequently
have similar classifiers within the population. The chance of having the same
building blocks is even exponentially smaller [43, 150]. Despite these issues it is
still worth trying to construct an EDA that can be used with the population
structure at hand, at least to provide a more principled crossover operator. What
needs to be considered when constructing the statistical population model is the
irrelevance of the location of a classifiers within an individual. The only thing
that matters is the classifier itself, and if it frequently co-occurs with the same
other classifiers. This should allow modelling and preserving building blocks
within a set of classifiers. An additional enhancement of the model is to take the
nature of the matching function into account, as done for Michigan-style LCS
by Butz and Pelikan [54] and Butz et al. [55].
Search WWH ::




Custom Search