Mixing Independently Trained Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

heuristic methods, ii) also performs better than the least squares approximation,

and iii) mixing as done in XCS(F) performs worse than all other methods.

In all experiments a set of K linear regression classifiers is created such that

the number of classifiers matching each input is about the same for all inputs.

These classifiers are trained on all available observations by batch learning, be-

fore the mixing models are applied and their performance measured by the li-

kelihood (6.1). This setup was chosen for several reasons: firstly, mixing is only

required if several classifiers match the same input, which is provided by the

generated set of classifiers. Secondly, the classifiers are trained before the mixing

models are applied, as we want to only compare the mixing models based on

the same set of classifiers, and not how training of classifiers and mixing them

interacts. Finally, the likelihood measure is used to compare the performance of

the mixing models, rather than some form of squared error or similar, as the aim

in this chapter is to discuss methods that maximise this likelihood, rather than

any other measure.

6.3.1

Experimental Design

Regression Tasks. The mixing models are evaluated on four regression tasks

f :

, given in Table 6.1. The input range is [0 , 1], and the output is

shifted and scaled such that

R → R

0 . 5. 1000 observations ( i n ,f ( i n ))

are taken from the target function f at regular intervals, from 0 to 1, to give

the output vector y =( f ( i 1 ) ,...,f ( i 1000 )) T . The input matrix for averaging

classifiers is given by X =(1 ,..., 1) T , and for classifiers that model straight

lines by a 1000

−

0 . 5

≤

f ( x )

≤

2matrix X with the n th row given by (1 ,i n ).

Table 6.1. The set of functions used for evaluating the performance of the different

mixing models. The functions are taken from Donoho and Johnstone [73], and have

been previously used in Booker [23] in an LCS-related study. The functions are samples

over the range [0 , 1] and their outputs are normalised to − 0 . 5 ≤ f ( x ) ≤ 0 . 5.

Function Definition

Blocks

f ( x )= P h j K ( x − x j ) , ( x ) = (1 + sgn( x )) / 2 ,

( x j )=(0 . 1 , 0 . 13 , 0 . 15 , 0 . 23 , 0 . 25 , 0 . 40 , 0 . 44 , 0 . 65 ,

0 . 76 , 0 . 78 , 0 . 81) ,

( h j )=(4 , − 5 , 3 , − 4 , 5 , − 4 . 2 , 2 . 1 , 4 . 3 , − 3 . 1 , 5 . 1 , − 4 . 2) .

f ( x )= P h j K (( x − x j ) /w j ) ,

4 ) − 1

Bumps

( x )=(1+ |x|

( x j )= x Blocks ,

( h j )=(4 , 5 , 3 , 4 , 5 , 4 . 2 , 2 . 1 , 4 . 3 , 3 . 1 , 5 . 1 , 4 . 2) ,

( w j )=(0 . 005 , 0 . 005 , 0 . 006 , 0 . 01 , 0 . 01 ,

0 . 03 , 0 . 01 , 0 . 01 , 0 . 005 , 0 . 008 , 0 . 005) .

f ( x )=( x (1 − x )) 1 / 2 sin(2 π (1 + 0 . 05) / ( x +0 . 05))

Doppler

Heavisine

f ( x )=4sin4 πx − sgn( x − 0 . 3) − sgn(0 . 72 − x )

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home