Information Technology Reference
In-Depth Information
heuristic methods, ii) also performs better than the least squares approximation,
and iii) mixing as done in XCS(F) performs worse than all other methods.
In all experiments a set of K linear regression classifiers is created such that
the number of classifiers matching each input is about the same for all inputs.
These classifiers are trained on all available observations by batch learning, be-
fore the mixing models are applied and their performance measured by the li-
kelihood (6.1). This setup was chosen for several reasons: firstly, mixing is only
required if several classifiers match the same input, which is provided by the
generated set of classifiers. Secondly, the classifiers are trained before the mixing
models are applied, as we want to only compare the mixing models based on
the same set of classifiers, and not how training of classifiers and mixing them
interacts. Finally, the likelihood measure is used to compare the performance of
the mixing models, rather than some form of squared error or similar, as the aim
in this chapter is to discuss methods that maximise this likelihood, rather than
any other measure.
6.3.1
Experimental Design
Regression Tasks. The mixing models are evaluated on four regression tasks
f :
, given in Table 6.1. The input range is [0 , 1], and the output is
shifted and scaled such that
R R
0 . 5. 1000 observations ( i n ,f ( i n ))
are taken from the target function f at regular intervals, from 0 to 1, to give
the output vector y =( f ( i 1 ) ,...,f ( i 1000 )) T . The input matrix for averaging
classifiers is given by X =(1 ,..., 1) T , and for classifiers that model straight
lines by a 1000
0 . 5
f ( x )
×
2matrix X with the n th row given by (1 ,i n ).
Table 6.1. The set of functions used for evaluating the performance of the different
mixing models. The functions are taken from Donoho and Johnstone [73], and have
been previously used in Booker [23] in an LCS-related study. The functions are samples
over the range [0 , 1] and their outputs are normalised to 0 . 5 ≤ f ( x ) 0 . 5.
Function Definition
Blocks
f ( x )= P h j K ( x − x j ) , ( x ) = (1 + sgn( x )) / 2 ,
( x j )=(0 . 1 , 0 . 13 , 0 . 15 , 0 . 23 , 0 . 25 , 0 . 40 , 0 . 44 , 0 . 65 ,
0 . 76 , 0 . 78 , 0 . 81) ,
( h j )=(4 , − 5 , 3 , − 4 , 5 , − 4 . 2 , 2 . 1 , 4 . 3 , − 3 . 1 , 5 . 1 , − 4 . 2) .
f ( x )= P h j K (( x − x j ) /w j ) ,
4 ) 1
Bumps
( x )=(1+ |x|
,
( x j )= x Blocks ,
( h j )=(4 , 5 , 3 , 4 , 5 , 4 . 2 , 2 . 1 , 4 . 3 , 3 . 1 , 5 . 1 , 4 . 2) ,
( w j )=(0 . 005 , 0 . 005 , 0 . 006 , 0 . 01 , 0 . 01 ,
0 . 03 , 0 . 01 , 0 . 01 , 0 . 005 , 0 . 008 , 0 . 005) .
f ( x )=( x (1 − x )) 1 / 2 sin(2 π (1 + 0 . 05) / ( x +0 . 05))
Doppler
Heavisine
f ( x )=4sin4 πx − sgn( x − 0 . 3) sgn(0 . 72 − x )
Search WWH ::




Custom Search