A Probabilistic Model for LCS - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

the classifiers were trained according to this probability. While the gating net-

work still has the same purpose when the classifiers are trained independently,

the estimated probability is not fed back to the classifiers anymore. The cost of

this lack of feedback is a worse fit of the model to the data, which results in a

lower likelihood of the data given the model structure.

Note, however, that independent classifier training only causes a change in

the likelihood in areas where more than one classifier matches the same input.

Hence, we only get a lower likelihood if classifiers have large areas of overlap, and

it is doubtful that such a solution is ever desired. Nonetheless, the potentially

worse fit needs to be offset by the model structure search to find solutions with

suciently minimal overlap between different classifiers.

As the gating network is not gating the observations to the different classifiers

anymore, but rather mixes the independently trained classifier models to best

explain the available observations, it will in the remaining chapters be referred

to as the mixing model rather than the gating network.

4.5

A Brief Comparison to Linear LCS Models

The LCS model introduced in this chapter is a non-linear model as both the

classifiers and the mixing model have tunable parameters. It is in its structure

very similar to XCS and its derivatives, as well as to other LCS that train is

classifiers independently (for example, CCS [153, 154]).

Another popular structure for LCS models is a linear one, which is charac-

terised by the output estimate f M ( x ; θ ) being a linear function of the model

parameters θ . Assuming a linear classifier model (4.15) and a output estimate

f M ( x ; θ ) formed by the mean of p ( y

|

x , θ ) by (4.8), this estimate is given by

K

f M ( x ; θ )=

g k ( x ) w k x .

E

( y

|

x , θ )=

(4.25)

k =1

In order for f M to be a linear in θ ,the g k 's have to be independent of the

parameters, unlike for the generalised MoE

where they are parametrised by

V

; θ ) to be convex with respect to the

w k 's, with a unique maximum that is easy to find.

The linear LCS model can have different instantiations by specifying the g k 's

differently. Due to their current use in LCS, two of these instantiations are of

particular interest. The first is given by g k ( x )= m k ( x ), such that (4.25) becomes

∈

θ . This causes the log-likelihood l (

D

K

f M ( x ; θ )=

m k ( x ) w k x .

(4.26)

k =1

Therefore, for each input x , the models of all matching classifiers are effectively

agglomerated. This clearly shows that the classifiers do not form their predictions

independently. As a consequence, classifiers cannot be interpreted as localised

models, but are rather localised components of the model that is formed by the

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home