Information Technology Reference
In-Depth Information
5 Training the Classifiers
The model of a set of classifiers consists of the classifiers themselves and the mi-
xing model. The classifiers are localised linear regression or classification models
that are trained independently of each other, and their localisation is determined
by the matching function m k . This chapter is entirely devoted to the training
of a single classifier and mainly focuses on the linear regression models, but also
briefly discusses classification at the end of the chapter.
The linear classifier model was already introduced in Sec. 4.2.1, but here more
details are provided about its underlying assumptions, and how it can be trained
in both a batch learning and an incremental learning way. Most of the concepts
and methods in this chapter are well known in statistics (for example, [97]) and
adaptive filter theory (for example, [105]), but have not been put into the context
of LCS before.
In training a classifier we focus on solving (4.24), which emerges from app-
lying the principle of maximum likelihood to the LCS model. Maximising the
likelihood minimises the empirical rather than the expected risk, which might
lead to overfitting. Nonetheless, it provides a first approach to training the clas-
sifiers, and results in parameter update equations that are for regression models
mostly equivalent to the ones used in XCS(F), which confirms that the LCS
model is in its structure similar to XCS(F). Chapter 7 returns to dealing with
over- and underfitting, with methods that are closely related to the methods
derived in this chapter.
The classifier model parameters to estimate are the weight vector and its noise
variance for the linear regression model, and the weight vector alone for the
classification model. The noise variance is a good indicator of the goodness-of-fit
of the linear model and is also used in a modified form to estimate the accuracy
of a classifier in XCS and its variants. In general, it is useful to guide the model
structure search as we have already discussed in Sect. 3.2.6, and thus having
a good estimate of the noise variance is advantageous. Thus, we put special
emphasis on how to estimate it e ciently and accurately. For the classification
model, a classifier quality measure emerges naturally from the estimated weight
vector and does not need to be estimated separately, as shown in Sect. 5.5.
 
Search WWH ::




Custom Search