Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Since each classifier is trained independently (see Sect. 4.4), this chapter fo-

cuses exclusively on the training of a single classifier k . To keep the notation

uncluttered, the subscript k is dropped; that is, the classifier's matching func-

tion m k is denoted m , the model parameters θ k =

become w and τ ,

and the estimate f k provided by classifier k is denoted f . For any further varia-

bles introduced throughout this chapter it will be explicitly stated whether they

are local to a classifier.

Firstly, the linear regression classifier model and its underlying assumptions

are introduced, followed in Sect 5.2 by how to estimate its parameters if all trai-

ning data is available at once. Incremental learning approaches are discussed in

Sect. 5.3, where gradient-based and exact methods of tracking the optimal weight

vector estimate are described. Estimating the noise variance simultaneously is dis-

cussed for both methods in Sect. 5.3.7. In Sect. 5.4, slow convergence of gradient-

based methods is demonstrated empirically. Turning to classification, the training

of these models is discussed in Sect. 5.5, after which the chapter is summarised by

putting its content into the context of current LCS.

{

w k ,τ k }

5.1

Linear Classifier Models and Their Underlying

Assumptions

Linear regression models were chosen as a good balance between the expres-

siveness of the model and the ease of training the model (see Sect. 3.2.3). The

univariate linear model has already been introduced Sect. 4.2.1, but here, its

underlying assumptions and implications are considered in more detail.

5.1.1

Linear Models

A linear model assumes a linear relation between the inputs and the output,

parametrised by a set of model parameters. Given an input vector x with

D X elements, the model is parametrised by the equally-sized random vector ω

with realisation w , and assumes that the scalar output random variable υ with

realisation y follows the relation

υ = ω T x + ,

(5.1)

where is a zero-mean Gaussian random variable that models the stochasticity of

the process and the measurement noise. Hence, ignoring for now the noise term

, its is assumed that the process generates the output by a weighted sum of the

components of the input, as becomes very clear when considering a realisation

w of ω , and rewriting the inner product

w T x

≡

w i x i ,

(5.2)

i

where w i and x i are the i th element of w and x respectively.

While linear models are usually augmented by a bias term to offset them

from the origin, it will be assumed that the input vector always contains a single

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home