Information Technology Reference
In-Depth Information
performs worst even on the smoothest function in the test set, which is the
Heavisine function.
Overall, these experiments confirm empirically that IRLS performs best. Ho-
wever, due to its high complexity and bad scaling properties, it is not recom-
mendable for applications that require the use of a large number of classifiers.
While the least squares approximation could be used as an alternative in such
cases, the results suggest that InvVar provides better results. Additionally, it is
easier to implement than LS and LSf, and requires no incremental update. Thus,
it should be the preferred method to use.
6.4
Relation to Previous Work and Alternatives
A closely related previous study has investigated mixing models for LCS with
the aim of minimising the mean squared error of the global prediction rather than
maximising its likelihood [83]. Formally, the aim was to find a mixing model that
minimises
f ( x n )
f ( x n ) 2
N
,
(6.35)
n =1
where f is the target function, and f ( x n ) is the global output prediction for
input x n . This problem statement can be derived from a model that assumes
the relation between f and
f to be
f ( x )= f ( x )+ ,where
(0 2 )isa
zero-mean constant variance Gaussian that represents the random noise. The
maximum likelihood estimate for the parameters of
∼N
f is found by maximising
n ln
f ( x n ) 2 ), which is equivalent to minimising (6.35).
In the LCS model with linear regression classifiers, introduced in Chap. 4, on
the other hand, zero-mean constant variance Gaussian noise is assumed on each
local model p ( y
N
( f ( x n )
|
|
x , θ k ) rather than the global model p ( y
|
x , θ ). These models are
x , θ )= k g k ( x ) p ( y
related by p ( y
x , θ k ), and as g k ( x ) might change with x ,
the noise variance of the global model is very likely not constant. As a result, the
maximum likelihood estimate for the LCS model as introduced in Chap. 4 does
not conform to minimising (6.35). Nonetheless, the results based on minimising
(6.35) are qualitatively the same as they show that amongst the heuristics InvVar
features competitive performance, is usually better than Conf and MaxConf, and
always outperforms XCS.
Modelling the noise on the local model level rather than the global model level
is required to train the classifiers independently. It also makes explicit the need
for a mixing model. In contrast, one could - as in Sect. 4.5 - assume a linear
LCS model that features noise at the global level, such that an output y given
some input x is modelled by
|
|
y
g k ( x ) w k x 1 ,
K
p ( y
|
x , θ )=
N
(6.36)
k =1
where g k ( x ) is some function of the matching functions m k ( x ), independent of
θ . In such a case, one could interpret the values of g k ( x ) to form the mixing
 
Search WWH ::




Custom Search