A Probabilistic Model for LCS - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

combination of all classifiers. While this by itself is not necessarily a drawback,

the need to re-train overlapping classifiers when adding or removing classifiers to

search the model structure space is clearly a disadvantage of the linear structure,

and generally of all structures that do not train classifiers independently. Also,

due to the interaction of overlapping classifiers, there is no clear indicator of the

quality of a single classifier. LCS instances that use this agglomerating structure

are ZCS [236], as identified by Wada et al. [224], and an LCS developed by

Booker [23]. In both cases, the quality measure of classifier k is a measure of the

magnitude of its parameters w k - a method called “fitness sharing” in the case

of ZCS 6 .

An alternative to agglomerating classifiers in linear models is to average over

them by using g k ( x )= m k ( x ) / k m k ( x ), such that (4.25) becomes

K

m k ( x )

f M ( x ; θ )=

k m k ( x ) w k x .

(4.27)

k =1

Note that this form is very similar to the gating network (4.22) of the generalised

MoE, with the difference that the average is not weighted by the quality of the

prediction of the classifiers. Thus, the fit of this model will be certainly worse

than the weighted averaging of the generalised MoE. Also, even though now

the predictions of overlapping classifiers do not directly depend one each other,

the value of g k ( x ) still depends on other classifiers matching the same input x .

Thus, classifiers are not trained independently, and they needs to be re-trained

in case of the removal or addition of overlapping classifiers. An instance of this

form of linear LCS was introduced by Wada et al. as a linearised version of XCS

[223].

It needs to be emphasised that this section is not supposed to demonstrate

the superiority of the introduced LCS model and its currently used instances

over LCS based on linear models. Rather, it attempts to point out significant

differences between these two model types and its consequences. Having a linear

model structure removes the need of an explicit mixing model and simplifies

finding the model parameters for a fixed model structure, but this comes at

the price of having to re-train the model once this structure changes. Using

non-linear models, on the other hand, requires a mixing model and the intro-

duction of independent classifier training (as a rather unsatisfying solution) to

simplify the training of a single model structure, but simplifies changing this

structure and provides a clearer interpretation of the model formed by a single

classifier.

6 It is not clear if such a quality measure is indeed useful in all occasions. Booker

proposed to consider classifiers with low parameter values as bad classifiers, as “The

ones with large weights are the most important terms in the approximation” [24],

but would that also work in cases where low parameter values are actually good

parameter values? One can easily imaging a part of a function that is constantly 0

and thus requires 0 parameter values to model it.

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home