A Probabilistic Model for LCS - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

equivalent to having each classifier match all inputs. This results in a set of

classifiers that all match the whole input space, and localisation is performed by

soft linear partitioning of the gating network.

4.3.5

Relation to LCS

This generalised MoE model satisfies all characteristics of LCS outlined in

Sect. 3.2: Each classifier describes a localised model with its localisation de-

termined by the model structure, and the local models are combined to form a

global model. So given that the model can be trained eciently, and that there

exists a good mechanism for searching the space of model structures, do we al-

ready have an LCS? While some LCS researchers might disagree — partially

because there is no universal definition of what an LCS is and LCS appear to

be mostly thought of in algorithmic terms rather than in terms of the model

that they describe — the author believes that this is the case.

However, the generalised MoE model has a feature that no LCS has ever

used: beyond the localisation of classifiers by their matching function, the re-

sponsibilities of classifiers that share matching inputs is further distributed by

the softmax function. While this feature might lead to a better fit of the mo-

del to the data, it blurs the observation/classifier association by extending it

beyond the matching function. Nonetheless, the introduced transfer function φ

can be used to level this effect: when defined as the identity function φ ( x )= x ,

then by (4.21) the probability of a certain classifier generating an observation

for a matching input is log-linearly related to the input x . However, by setting

φ ( x ) = 1 for all x , the relation is reduced to g k ( x )

m k ( x )exp( v k ), where the

gating parameter v k reduces to the scalar v k . Hence, the gating weight becomes

independent of the input (besides the matching) and only relies on the constant

v k through exp( v k ). In areas of the input space that several classifiers match,

classifiers with a larger v k have a stronger influence when forming a prediction

of the global model, as they have a higher gating weight. To summarise, setting

φ ( x ) = 1 makes gating independent of the input (besides the matching) and the

gating weight for each classifier is determined by a single scalar that is inde-

pendent of the current input x that it matches. Further details and alternative

models for the gating network are discussed in Chap. 6.

Note that φ ( x ) = 1 is not applicable in the standard MoE model, that is,

when all classifiers match the full input space. In this case, we have neither

localisation by matching nor by the softmax function. Hence, the global model

is not better at modelling the data than a single local model applied to the whole

data.

∝

Example 4.2 (Localisation by Matching and the Softmax Function). Consider the

same setting as in Example 4.1, and additionally φ ( x )= x for all x and the

matching functions

⎧

⎨

1if x 1 + x 2 ≤

3 ,

m 1 ( x )=

(4.23)

⎩

0 rwi ,

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home