An Algorithmic Description - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

the light of the available data. By Bayes' rule p ( c k |M

)

∝

p (

c k ,

) p ( c k |M

where the classifier model evidence p ( c k |M

), is similarly to (7.2) given by

p (

c k ,

p (

θ k ,c k ,

) p ( θ k |

c k ,

)d θ k ,

(8.14)

and matching needs to be taken into account when formulating p (

θ k ,c k ,

As for good model structures, good classifiers are classifiers for which p ( c k |M

)

is large, or equivalently, for which p (

c k ,

) is large, given uniform classifier

priors p ( c k |M

). Therefore, the mutation operatoroftheGAcanbebiasedto-

wards changing bad classifiers, and genetic operators can construct new indivi-

duals from good classifiers of other individuals, or prune individuals by removing

bad classifiers.

From the GA side itself, a more principled approach might be sought from

evolutionary computation techniques where variable-length individuals are com-

mon. Genetic programming (GP) is one of them, as the program that each indi-

vidual represents is not fixed in size. However, while the fitness of a program does

not necessarily decrease with its complexity, Bayesian model selection penalises

overly complex model structures. Thus, GP suffers from the problem of indivi-

duals growing out of bound that is naturally avoided in the approach presented

here. Nonetheless, some of the theoretical results of GP might still be applicable

to improving the GA to search the model structure space.

Another route that can additionally improve the performance of the GA is

to use Estimation of Distribution Algorithms (EDAs) [183] that improve the

crossover operator by detecting and preserving building blocks. They do so by

creating a statistical model of the high-fitness individuals of the current popu-

lation and draw samples from this model to create new individuals, rather than

using standard crossover operators. Good building blocks are expected to appear

in several individuals and consequently receive additional support in the model.

The Bayesian Optimization Algorithm (BOA), for example, models the alleles of

selected individuals by a Bayesian network that is samples thereafter to produce

new individuals [182].

Currently, there exists no EDA that can handle variable-length individuals

adequately [150]. The problem is that the space of possible classifiers, that is, the

space of possible matching function parametrisations, is too large to frequently

have similar classifiers within the population. The chance of having the same

building blocks is even exponentially smaller [43, 150]. Despite these issues it is

still worth trying to construct an EDA that can be used with the population

structure at hand, at least to provide a more principled crossover operator. What

needs to be considered when constructing the statistical population model is the

irrelevance of the location of a classifiers within an individual. The only thing

that matters is the classifier itself, and if it frequently co-occurs with the same

other classifiers. This should allow modelling and preserving building blocks

within a set of classifiers. An additional enhancement of the model is to take the

nature of the matching function into account, as done for Michigan-style LCS

by Butz and Pelikan [54] and Butz et al. [55].

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home