Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

5 Training the Classifiers

The model of a set of classifiers consists of the classifiers themselves and the mi-

xing model. The classifiers are localised linear regression or classification models

that are trained independently of each other, and their localisation is determined

by the matching function m k . This chapter is entirely devoted to the training

of a single classifier and mainly focuses on the linear regression models, but also

briefly discusses classification at the end of the chapter.

The linear classifier model was already introduced in Sec. 4.2.1, but here more

details are provided about its underlying assumptions, and how it can be trained

in both a batch learning and an incremental learning way. Most of the concepts

and methods in this chapter are well known in statistics (for example, [97]) and

adaptive filter theory (for example, [105]), but have not been put into the context

of LCS before.

In training a classifier we focus on solving (4.24), which emerges from app-

lying the principle of maximum likelihood to the LCS model. Maximising the

likelihood minimises the empirical rather than the expected risk, which might

lead to overfitting. Nonetheless, it provides a first approach to training the clas-

sifiers, and results in parameter update equations that are for regression models

mostly equivalent to the ones used in XCS(F), which confirms that the LCS

model is in its structure similar to XCS(F). Chapter 7 returns to dealing with

over- and underfitting, with methods that are closely related to the methods

derived in this chapter.

The classifier model parameters to estimate are the weight vector and its noise

variance for the linear regression model, and the weight vector alone for the

classification model. The noise variance is a good indicator of the goodness-of-fit

of the linear model and is also used in a modified form to estimate the accuracy

of a classifier in XCS and its variants. In general, it is useful to guide the model

structure search as we have already discussed in Sect. 3.2.6, and thus having

a good estimate of the noise variance is advantageous. Thus, we put special

emphasis on how to estimate it e ciently and accurately. For the classification

model, a classifier quality measure emerges naturally from the estimated weight

vector and does not need to be estimated separately, as shown in Sect. 5.5.

Search WWH ::

Custom Search

Home