Information Technology Reference
In-Depth Information
the generalisation capabilities as it provides a more accurate and stable estimate
of the model quality of a classifier and subsequently a fitness estimate with the
same qualities [155].
Nonetheless, the linear regression training methods introduced in this chapter
are by no means to be interpreted as the ultimate methods to use to train the
classifier models. Alternatively, one can use the procedure deployed in this chap-
ter to adapt other parameter estimation techniques to their use in LCS. Still,
currently the RLS algorithm is the best known incremental method to track the
optimal weight estimate under the given assumptions, while simultaneously ac-
curately estimating the noise variance. Hence, given that one aims at minimising
the squared error (5.5), it should be the method of choice.
As an alternative to the squared error that corresponds to the assumption of
Gaussian noise, one can consistently aim at estimating the weight vector that
minimises the mean absolute error (5.75) [157]. However, this requires a modi-
fication of the assumptions about the distributions of the different linear model
variables. Additionally, there is currently no known method to incrementally
track the optimal weight estimate, as RLS does for the squared error measure.
This also means that (5.68) cannot be used to track the model error, and slower
gradient-based alternatives have to applied.
With respect to classification, the training of an appropriate LCS model has
been discussed for both batch and incremental training. The method differs from
current XCS-based LCS, such as UCS [161], in that it does not require augmen-
tation of the input space by a separate class label (see Sect. 3.1.3), and evalua-
ting classifiers based on how accurate its associated class is represented within
its matched area of the input space. Instead, no assumptions are made about
which class is modelled by a classifier, and the probability of having generated
the observations of either class is estimated. This estimate can additionally be
used to measure the quality of a classifier, based on the idea that good classifiers
predict a single class with high probability. This concept has been firstly applied
in an XCS-like context by Dam, Abbass, and Lokan in a Bayesian formulation
for two-class classification problems with the result of improved performance and
faster learning [67]. Further evaluation and extensions to multi-class problems
are still pending.
A later chapter reconsiders the probabilistic structure of both the linear re-
gression and classification models, and shows how the development of a probabi-
listic approach allows the model to be embedded in a fully Bayesian framework
that also lends itself to application to multi-dimensional output spaces in the re-
gression case. Before that, let us in the following chapter consider another LCS
component that, contrary to the weight vector estimate of XCS, has received
hardly any attention in LCS research: how the local models provided by the
classifiers are combined to form a global model.
 
 
Search WWH ::




Custom Search