Information Technology Reference
In-Depth Information
Squared Error or Absolute Error?
XCSF (of which XCS is a special case) initially applied the NLMS method (5.29)
[237], and later the RLS algorithm by (5.34) and (5.35) [142, 143] to estimate
the weight vector. The classifier estimation error is tracked by the LMS update
+ m ( x N +1 )
,
τ 1
N +1 = τ 1
w N +1 x N +1
τ 1
N
|
y N +1 |−
(5.73)
N
to - after N observations - perform stochastic incremental gradient descent on
the error function
m ( x n ) τ 1
2
N
w N x n
−|
y n |
.
(5.74)
n =1
Therefore, the error that is estimated is the mean absolute error
m ( x n )
y n
N
c N
w N x n
,
(5.75)
n =1
rather than the MSE (5.62). Thus, XCSF does not estimate the error that its
weight vector estimate aims at minimising, and does not justify this inconsistency
- probably because the errors that are minimised have never before been expli-
citly expressed. While there is no systematic study that compares using (5.62)
rather than (5.75) as the classifier error estimate in XCSF, we have recommen-
ded in [155] to use the MSE for the reason of consistency and easier tracking by
(5.68), and - as shown here - to provide its probabilistic interpretation as the
noise precision estimate τ of the linear model.
5.3.8
Summarising Incremental Learning Approaches
Various approaches to estimating the weight vector and noise precision estimate
of the linear model (5.3) have been introduced. While the gradient-based mo-
dels, such as LMS or NLMS, are computationally cheap, they require problem-
dependent tuning of the step size and might feature slow convergence to the
optimal estimates. RLS and Kalman filter approaches, on the other hand, scale
at best with
( D 2
X
), but are able to accurately track both the optimal weight
vector estimate and its associated noise precision estimate simultaneously.
Table 5.1 gives a summary of all the methods introduced in this chapter (omit-
ting the recency-weighted variants), together with their computational comple-
xity. As can be seen, this complexity is exclusively dependent on the size of the
input vectors for use by the classifier model (in contrast to their use for mat-
ching). Given that we have averaging classifiers, we have D X = 1, and thus,
all methods have equal complexity. In this case, the RLS algorithm with direct
noise precision tracking should always be applied. For higher-dimensional in-
put spaces, the choice of the algorithm depends on the available computational
resources, but the RLS approach should always be given a strong preference.
O
 
Search WWH ::




Custom Search