A Probabilistic Model for LCS - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

4.4

Independent Classifier Training

The assumption of the standard MoE model is that any observation is generated

by one and only one classifier. This was generalised by adding the restriction that

any classifier can only have generated an observation if it matches the input

associated with this observation, thereby adding an additional layer of forced

localisation of the classifiers in the input space.

Here, a change rather than a generalisation is introduced to the model ass-

umptions: as before it is assumed that the data is generated by a combination

of localised processes, but the role of the classifiers is changed from cooperating

with other classifiers in order to locally model the observations that it matches

to modelling all observations that it matches, independent of the other classifiers

that match the same inputs. This distinction becomes clearer once the resulting

formal differences have been discussed in Sects. 4.4.2 and 4.4.3.

The motivation behind this change is twofold: firstly, it removes local maxima

and thus simplifies classifier training, and secondly, it simplifies the intuition

behind what a classifier models. Firstly, these motivations are discussed in more

details, followed by their implication on training the model and the assumptions

about the data-generating process.

4.4.1

The Origin of Local Maxima

Following the discussion in Sect. 4.1.5, local maxima of the likelihood function are

the result of the simultaneous training of the classifiers and the gating network. In

the standard MoE model, this simultaneous training is necessary to provide the

localisation of the classifiers in the input space. In the introduced generalisation,

on the other hand, a preliminary layer of localisation is provided by the matching

function, and the interaction between classifiers and the gating network is only

required for inputs that are matched by more than one classifier. This was already

demonstrated in Sect. 4.3.3, where it was shown that classifiers acquire full

responsibility for inputs that they match alone. Hence, in the generalised MoE,

local maxima only arise when classifiers overlap in the input space.

4.4.2

What Does a Classifier Model?

By (4.14), a classifier aims at maximising the sum of log-likelihoods of all ob-

servations, weighted by the responsibilities. By (4.12) and (4.22), these respon-

sibilities can only be non-zero if the classifier matches the corresponding inputs,

that is, r nk > 0onlyif m k ( x n ) > 0. Hence, by maximising (4.14), a classifier

only considers observationsthatitmatches.

Given that an observation ( x n , y n ) is matched by a single classifier k ,itwas

established in Sect. 4.3.3 that r nk =1and r nj =0forall j

= k . Hence, (4.14)

assigns full weight to classifier k when maximising the likelihood of this ob-

servation. Consequently, given that all observations that a classifier matches are

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home