Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

which defines the probabilistic model of a linear regression and forms the core

of its investigation.

That the assumption of Gaussian noise is sensible is discussed at length by

Maybeck [164, Chap. 1].

5.1.3 Maximum Likelihood and Least Squares

To model the matched observations, a classifier aims at maximising the pro-

bability of these observations given its model, as formally described by (4.24).

Combined with the linear model (5.3), the term to maximise by a single classifier

k is given by

x n , w ,τ − 1 )=

m ( x n )ln p ( y n |

n =1

m ( x n )

y n ) 2 .

2 ln(2 π )+ 1

2 ( w T x n −

−

2 ln τ

−

(5.4)

n =1

As already shown in Sect. 4.2.1, maximising (5.4) with respect to the weight

vector w results in the weighted least squares problem,

m ( x n ) w T x n −

y n 2 ,

mi w

(5.5)

n =1

where the weights are given by the classifier's matching function. Thus, to de-

termine w by maximum likelihood, we only consider observations for which

m ( x n ) > 0, that is, which are matched.

To determine the noise precision of the fitted model, we maximise (5.4) with

respect to τ , resulting in the problem

ln( τ )

y n 2 ,

m ( x n ) w T x n −

max

m ( x n )+ τ

(5.6)

n =1

where w is the weight vector determined by (5.5).

The rest of this chapter is devoted to discussing batch and incremental lear-

ning solutions to (5.5) and (5.6), starting with batch learning.

5.2

Batch Learning Approaches to Regression

When performing batch learning, all data

is assumed to be available at once

(see Sect. 3.1.5). Hence, we have full knowledge of

{

x n ,y n }

, N and, knowing the

current model structure

, also of the classifier's matching function m .

Let us now apply this approach to find the classifier's model parameters by

solving (5.5) and (5.6).

Notation

The following notation is used in this and the remaining chapters: let x , y

∈ R

x T y be the inner

be vectors, and A

∈ R

× R

a diagonal matrix. Let

x , y

≡

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home