The Optimal Set of Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Computing this probability density requires a Bayesian LCS model that was

introduced by adding priors to the probabilistic model from Chap. 4. Additio-

nally, the flexibility of the regression classifier model was increased from univa-

riate to multivariate regression. The requirement of specifying prior parameters

is not a weakness of this approach, but rather a strength, as the priors make

explicit the commonly implicit assumptions made about the data-generating

process.

Variational Bayesian inference was employed to find a closed-form solution to

p ( M|D ), in combination with various approximation to handle the generalised

softmax function that is used to combine the local classifier models to a global

model. Whilst variational Bayesian inference usually provides a lower bound

L

( q )

on ln p (

D|M

) that is directly related to p (

M|D

), these approximations invalidate

the lower bound nature of

L

( q ). Even without these approximations, the use of

L

( q ) for selecting the best set of classifiers depends very much on the tightness

of the bound, and if this tightness is consistent for different model structures

M

. Variational Bayesian inference has been shown to perform well in practice

[216, 19], and the same approximations that were used here were successfully

used for the Mixtures-of-Experts model [226, 227]. Thus, the presented method

can be expected to feature good performance when applied to LCS, but more

definite statements require further empirical investigation.

What was introduced in this chapter is the first formal and general definition

of what if means for a set of classifiers to be optimal, using the best applicable of

the currently common model selection approaches. The definition is general as

i) it is independent of the representation of the matching function, ii) it can be

used for both discrete and continuous input spaces, iii) it can handle matching

by degree, and iv) it is not restricted to the LCS model that is introduced

in this topic but is applicable to all LCS model types that can be described

probabilistically, including the linear LCS model. The reader is reminded that

the definition itself is independent of the variational inference, and thus is not

affected by the issues that are introduced through approximating the posterior.

A further significant advancement that comes with the definition of optimality is

a Bayesian model for LCS that goes beyond the probabilistic model as it makes

the prior assumptions about the data-generating process explicit. Additionally,

the use of multivariate regression is also a novelty in the LCS context.

Defining the best set of classifiers as a maximisation problem also promotes

its theoretical investigation: depending on the LCS model type, one could, for

example, ask the question if the optimal set of classifiers is ever overlapping.

In other words, does the optimal set of classifiers include classifiers that are

responsible for the same input and thus have overlapping matching? Given that

the removal of overlaps increases p (

) in all cases, then this is not the case.

Such knowledge can guide model structure search itself, as it can avoid classifier

constellations that are very likely suboptimal. Thus, further research in this

area is not only of theoretical value but can guide the design of other

M|D

LCS

components.

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home