Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Squared Error or Absolute Error?

XCSF (of which XCS is a special case) initially applied the NLMS method (5.29)

[237], and later the RLS algorithm by (5.34) and (5.35) [142, 143] to estimate

the weight vector. The classifier estimation error is tracked by the LMS update

+ m ( x N +1 )

,

τ − 1

N +1 = τ − 1

w N +1 x N +1 −

τ − 1

N

|

y N +1 |−

(5.73)

N

to - after N observations - perform stochastic incremental gradient descent on

the error function

m ( x n ) τ − 1

2

N

w N x n −

−|

y n |

.

(5.74)

n =1

Therefore, the error that is estimated is the mean absolute error

m ( x n )

y n

N

c − N

w N x n −

,

(5.75)

n =1

rather than the MSE (5.62). Thus, XCSF does not estimate the error that its

weight vector estimate aims at minimising, and does not justify this inconsistency

- probably because the errors that are minimised have never before been expli-

citly expressed. While there is no systematic study that compares using (5.62)

rather than (5.75) as the classifier error estimate in XCSF, we have recommen-

ded in [155] to use the MSE for the reason of consistency and easier tracking by

(5.68), and - as shown here - to provide its probabilistic interpretation as the

noise precision estimate τ of the linear model.

5.3.8

Summarising Incremental Learning Approaches

Various approaches to estimating the weight vector and noise precision estimate

of the linear model (5.3) have been introduced. While the gradient-based mo-

dels, such as LMS or NLMS, are computationally cheap, they require problem-

dependent tuning of the step size and might feature slow convergence to the

optimal estimates. RLS and Kalman filter approaches, on the other hand, scale

at best with

( D 2

X

), but are able to accurately track both the optimal weight

vector estimate and its associated noise precision estimate simultaneously.

Table 5.1 gives a summary of all the methods introduced in this chapter (omit-

ting the recency-weighted variants), together with their computational comple-

xity. As can be seen, this complexity is exclusively dependent on the size of the

input vectors for use by the classifier model (in contrast to their use for mat-

ching). Given that we have averaging classifiers, we have D X = 1, and thus,

all methods have equal complexity. In this case, the RLS algorithm with direct

noise precision tracking should always be applied. For higher-dimensional in-

put spaces, the choice of the algorithm depends on the available computational

resources, but the RLS approach should always be given a strong preference.

O

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home