Information Technology Reference
In-Depth Information
N
w = c 1
k
m ( x n ) y n ,
(6.31)
n =1
given all N observations. Hence, (6.31) is the batch formulation for the solution
that the incremental (6.30) approximates.
Applying this relation to the XCS update equations for the mixing parameters,
the mixing model employed by XCS(F) can be described as follows: The error
k of classifier k in XCS(F) is the mean absolute prediction error of its local
models, and is given by
N
m ( x n ) y n
w k x n .
k = c 1
(6.32)
k
n =1
The classifier's accuracy is some inverse function κ ( k ) of the classifier error.
This function was initially given by an exponential [237], but was later [239, 57]
redefined to
1
if < 0 ,
α 0 −ν
κ ( )=
(6.33)
otherwise ,
where the constant scalar 0 is known as the minimum error , the constant α is a
scaling factor, and the constant ν is a mixing power factor [57]. The accuracy is
constantly1uptotheerror 0 and then drops off steeply, with the shape of the
drop determined by α and ν .The relative accuracy is a classifier's accuracy for
a single input normalised by the sum of the accuracies of all classifiers matching
that input. The fitness is the relative accuracy of a classifier averaged over all
inputs that it matches, that is
N
m k ( x n ) κ ( k )
j =1 m j ( x n ) κ ( j )
F k = c 1
k
(6.34)
n =1
This fitness is the measure of a classifier's prediction quality, and hence γ k is
input-independently given by γ k ( x )= F k .
Note that the magnitude of a relative accuracy depends on both the error
of a classifier, and on the error of the classifiers that match the same input.
This makes the fitness of classifier k dependent on inputs that are matched by
classifiers that share inputs with classifier k , but are not necessarily matched by
this classifier. This might be a good measure for the fitness of a classifier (where
prediction quality is not all that counts), but it does not perform too well as a
measure of the prediction quality of a classifier.
6.3
Empirical Comparison
In order to compare how well the different heuristics perform with respect to the
aim of maximising (6.1), their performance is evaluated on a set of four regression
tasks. The results show that i) mixing by inverse variance outperforms the other
 
Search WWH ::




Custom Search