Information Technology Reference
In-Depth Information
itself is a well known problem with a multitude of approaches that goes far
beyond the ones described in this chapter. Nonetheless, it is usually not stated
as such in the LCS literature, and neither approached from first principles.
Additional novelties in the LCS context are a probabilistic interpretation of the
linear model and its noise structure, the resulting explicit formulation of the
predictive density, and rigorous batch and incremental estimates of the noise
variance.
The weight update of the original XCS conforms to (5.25) with x n =1for
n> 0 and hence aims at minimising the squared error (5.5). Later, XCS was
modified to act as regression model [240], and extended to XCSF to model
straight lines [241] by using the NLMS update (5.29), again without explicitly
stating a single classifier's aim. In a similar manner, the classifier model was
extended to a full linear model [141] 4 .
Simultaneously, and similar to the discussion in Sect. 5.3.4, the convergence
of gradient-based methods was identified as a problem [142, 143], with a discus-
sion based on steepest gradient descent rather than the NLMS method. As an
alternative, the RLS algorithm was proposed to estimate the weight vector, but
the aim of a classifier was specified without considering matching, and matching
was implemented by only updating the classifier's parameter if that classifier
matches the current input. While this is a valid procedure from the algorithmic
perspective, it does not make matching explicit in the classifier's aim, and cannot
deal with matching to a degree. The aim formulation (5.5), in contrast, provides
both features and thereby leads to a better understanding and greater flexibility
of the classifier model.
While XCSF weight estimation research did not stop at linear models [156,
175], the presented work was not extend beyond their realm to avoid the in-
troduction of multiple local optima that make estimating the globally optimal
weight vector significantly more complicated. In addition, there is always the
trade-off between the complexity of the local models and the global model to
consider: if more powerful local models are used, less of them are necessary to
provide the same level of complexity of the global model, but the increased com-
plexity and power makes their model usually harder to understand. For these
reasons, linear classifier models provide a good trade-off between ease of training
and power of the model, that are still relatively simple to interpret.
In contrast to the large amount of research activity seeking to improve the
weight vector estimation method in XCS, its method of estimating the classifier
model quality based on the absolute rather than the squared error was left
untouched since the initial introduction of XCS until we questioned its validity in
on the basis of the identified model aim [78], as also discussed in Sect. 5.3.7. The
modified error measure not only introduces consistency, but also allows accurate
tracking of the noise precision estimate with the method developed in Sect. 5.3.7,
as previously shown [78]. Used as a drop-in replacement for the mean absolute
error measure in XCSF, Loiacono et al. have shown that it, indeed, improves
4 Despite the title “Extending XCSF Beyond Linear Approximation” of [141], the
underlying model is still linear.
 
Search WWH ::




Custom Search