Mixing Independently Trained Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

performs worst even on the smoothest function in the test set, which is the

Heavisine function.

Overall, these experiments confirm empirically that IRLS performs best. Ho-

wever, due to its high complexity and bad scaling properties, it is not recom-

mendable for applications that require the use of a large number of classifiers.

While the least squares approximation could be used as an alternative in such

cases, the results suggest that InvVar provides better results. Additionally, it is

easier to implement than LS and LSf, and requires no incremental update. Thus,

it should be the preferred method to use.

6.4

Relation to Previous Work and Alternatives

A closely related previous study has investigated mixing models for LCS with

the aim of minimising the mean squared error of the global prediction rather than

maximising its likelihood [83]. Formally, the aim was to find a mixing model that

minimises

f ( x n )

f ( x n ) 2

−

(6.35)

n =1

where f is the target function, and f ( x n ) is the global output prediction for

input x n . This problem statement can be derived from a model that assumes

the relation between f and

f to be

f ( x )= f ( x )+ ,where

(0 ,σ 2 )isa

zero-mean constant variance Gaussian that represents the random noise. The

maximum likelihood estimate for the parameters of

∼N

f is found by maximising

n ln

f ( x n ) ,σ 2 ), which is equivalent to minimising (6.35).

In the LCS model with linear regression classifiers, introduced in Chap. 4, on

the other hand, zero-mean constant variance Gaussian noise is assumed on each

local model p ( y

( f ( x n )

x , θ k ) rather than the global model p ( y

x , θ ). These models are

x , θ )= k g k ( x ) p ( y

related by p ( y

x , θ k ), and as g k ( x ) might change with x ,

the noise variance of the global model is very likely not constant. As a result, the

maximum likelihood estimate for the LCS model as introduced in Chap. 4 does

not conform to minimising (6.35). Nonetheless, the results based on minimising

(6.35) are qualitatively the same as they show that amongst the heuristics InvVar

features competitive performance, is usually better than Conf and MaxConf, and

always outperforms XCS.

Modelling the noise on the local model level rather than the global model level

is required to train the classifiers independently. It also makes explicit the need

for a mixing model. In contrast, one could - as in Sect. 4.5 - assume a linear

LCS model that features noise at the global level, such that an output y given

some input x is modelled by

g k ( x ) w k x ,τ − 1 ,

p ( y

x , θ )=

(6.36)

k =1

where g k ( x ) is some function of the matching functions m k ( x ), independent of

θ . In such a case, one could interpret the values of g k ( x ) to form the mixing

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home