Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

the generalisation capabilities as it provides a more accurate and stable estimate

of the model quality of a classifier and subsequently a fitness estimate with the

same qualities [155].

Nonetheless, the linear regression training methods introduced in this chapter

are by no means to be interpreted as the ultimate methods to use to train the

classifier models. Alternatively, one can use the procedure deployed in this chap-

ter to adapt other parameter estimation techniques to their use in LCS. Still,

currently the RLS algorithm is the best known incremental method to track the

optimal weight estimate under the given assumptions, while simultaneously ac-

curately estimating the noise variance. Hence, given that one aims at minimising

the squared error (5.5), it should be the method of choice.

As an alternative to the squared error that corresponds to the assumption of

Gaussian noise, one can consistently aim at estimating the weight vector that

minimises the mean absolute error (5.75) [157]. However, this requires a modi-

fication of the assumptions about the distributions of the different linear model

variables. Additionally, there is currently no known method to incrementally

track the optimal weight estimate, as RLS does for the squared error measure.

This also means that (5.68) cannot be used to track the model error, and slower

gradient-based alternatives have to applied.

With respect to classification, the training of an appropriate LCS model has

been discussed for both batch and incremental training. The method differs from

current XCS-based LCS, such as UCS [161], in that it does not require augmen-

tation of the input space by a separate class label (see Sect. 3.1.3), and evalua-

ting classifiers based on how accurate its associated class is represented within

its matched area of the input space. Instead, no assumptions are made about

which class is modelled by a classifier, and the probability of having generated

the observations of either class is estimated. This estimate can additionally be

used to measure the quality of a classifier, based on the idea that good classifiers

predict a single class with high probability. This concept has been firstly applied

in an XCS-like context by Dam, Abbass, and Lokan in a Bayesian formulation

for two-class classification problems with the result of improved performance and

faster learning [67]. Further evaluation and extensions to multi-class problems

are still pending.

A later chapter reconsiders the probabilistic structure of both the linear re-

gression and classification models, and shows how the development of a probabi-

listic approach allows the model to be embedded in a fully Bayesian framework

that also lends itself to application to multi-dimensional output spaces in the re-

gression case. Before that, let us in the following chapter consider another LCS

component that, contrary to the weight vector estimate of XCS, has received

hardly any attention in LCS research: how the local models provided by the

classifiers are combined to form a global model.

Search WWH ::

Custom Search

Home