A Probabilistic Model for LCS - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

m nk x n

m k

θ k

y n

N

data

Fig. 4.6. Directed graphical model for training classifier k independently. See the

caption of Fig. 4.1 for instructions on how to read this graph. Note that the values of

the matching random variables m nk are determined by the matching function m k and

the inputs x n .

An additional consequence of classifiers being trained independently of the

responsibilities is that for standard choices of the local models (see, for example

[121]), the log-likelihood (4.24) is concave for each classifier. Therefore, it has a

unique maximum and consequently we cannot get stuck in local maxima when

training individual classifiers.

4.4.4

Training the Gating Network

Training the gating network remains unchanged, and therefore is described by

(4.12) and (4.13). Given a set of trained classifiers, the responsibilities are fully

specified by evaluating (4.12). Hence, the log-likelihood of the gating network

(4.13) is a concave function (for example, [20]), and therefore has a unique ma-

ximum.

Thus, the classifier models have unique optima and can be trained indepen-

dently of the gating network by maximising a concave log-likelihood function.

Furthermore, the gating network depends on the goodness-of-fit of the classifiers,

but as they are trained independently, the log-likelihood function of the gating

network is also concave. Therefore, the complete model has a unique maximum

likelihood, and as a consequence, the second goal of removing local maxima to

ease training of the model is reached.

4.4.5

Implications on Likelihood and Assumptions about the Data

Letting a classifier model match each observation with equal weight violates the

assumption that each observation was generated by one and only one classifier

for observations that are matched by more than one classifier. Rather, the model

of each classifier can be interpreted as a hypothesis for a data-generating process

that generated all observations of the matched area of the input space.

The gating network, on the other hand, was previously responsible for model-

ling the probabilities of some classifier having produced some observation, and

Search WWH ::

Custom Search

Home