Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

itself is a well known problem with a multitude of approaches that goes far

beyond the ones described in this chapter. Nonetheless, it is usually not stated

as such in the LCS literature, and neither approached from first principles.

Additional novelties in the LCS context are a probabilistic interpretation of the

linear model and its noise structure, the resulting explicit formulation of the

predictive density, and rigorous batch and incremental estimates of the noise

variance.

The weight update of the original XCS conforms to (5.25) with x n =1for

n> 0 and hence aims at minimising the squared error (5.5). Later, XCS was

modified to act as regression model [240], and extended to XCSF to model

straight lines [241] by using the NLMS update (5.29), again without explicitly

stating a single classifier's aim. In a similar manner, the classifier model was

extended to a full linear model [141] 4 .

Simultaneously, and similar to the discussion in Sect. 5.3.4, the convergence

of gradient-based methods was identified as a problem [142, 143], with a discus-

sion based on steepest gradient descent rather than the NLMS method. As an

alternative, the RLS algorithm was proposed to estimate the weight vector, but

the aim of a classifier was specified without considering matching, and matching

was implemented by only updating the classifier's parameter if that classifier

matches the current input. While this is a valid procedure from the algorithmic

perspective, it does not make matching explicit in the classifier's aim, and cannot

deal with matching to a degree. The aim formulation (5.5), in contrast, provides

both features and thereby leads to a better understanding and greater flexibility

of the classifier model.

While XCSF weight estimation research did not stop at linear models [156,

175], the presented work was not extend beyond their realm to avoid the in-

troduction of multiple local optima that make estimating the globally optimal

weight vector significantly more complicated. In addition, there is always the

trade-off between the complexity of the local models and the global model to

consider: if more powerful local models are used, less of them are necessary to

provide the same level of complexity of the global model, but the increased com-

plexity and power makes their model usually harder to understand. For these

reasons, linear classifier models provide a good trade-off between ease of training

and power of the model, that are still relatively simple to interpret.

In contrast to the large amount of research activity seeking to improve the

weight vector estimation method in XCS, its method of estimating the classifier

model quality based on the absolute rather than the squared error was left

untouched since the initial introduction of XCS until we questioned its validity in

on the basis of the identified model aim [78], as also discussed in Sect. 5.3.7. The

modified error measure not only introduces consistency, but also allows accurate

tracking of the noise precision estimate with the method developed in Sect. 5.3.7,

as previously shown [78]. Used as a drop-in replacement for the mean absolute

error measure in XCSF, Loiacono et al. have shown that it, indeed, improves

4 Despite the title “Extending XCSF Beyond Linear Approximation” of [141], the

underlying model is still linear.

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home