Information Technology Reference
In-Depth Information
combination of all classifiers. While this by itself is not necessarily a drawback,
the need to re-train overlapping classifiers when adding or removing classifiers to
search the model structure space is clearly a disadvantage of the linear structure,
and generally of all structures that do not train classifiers independently. Also,
due to the interaction of overlapping classifiers, there is no clear indicator of the
quality of a single classifier. LCS instances that use this agglomerating structure
are ZCS [236], as identified by Wada et al. [224], and an LCS developed by
Booker [23]. In both cases, the quality measure of classifier k is a measure of the
magnitude of its parameters w k - a method called “fitness sharing” in the case
of ZCS 6 .
An alternative to agglomerating classifiers in linear models is to average over
them by using g k ( x )= m k ( x ) / k m k ( x ), such that (4.25) becomes
K
m k ( x )
f M ( x ; θ )=
k m k ( x ) w k x .
(4.27)
k =1
Note that this form is very similar to the gating network (4.22) of the generalised
MoE, with the difference that the average is not weighted by the quality of the
prediction of the classifiers. Thus, the fit of this model will be certainly worse
than the weighted averaging of the generalised MoE. Also, even though now
the predictions of overlapping classifiers do not directly depend one each other,
the value of g k ( x ) still depends on other classifiers matching the same input x .
Thus, classifiers are not trained independently, and they needs to be re-trained
in case of the removal or addition of overlapping classifiers. An instance of this
form of linear LCS was introduced by Wada et al. as a linearised version of XCS
[223].
It needs to be emphasised that this section is not supposed to demonstrate
the superiority of the introduced LCS model and its currently used instances
over LCS based on linear models. Rather, it attempts to point out significant
differences between these two model types and its consequences. Having a linear
model structure removes the need of an explicit mixing model and simplifies
finding the model parameters for a fixed model structure, but this comes at
the price of having to re-train the model once this structure changes. Using
non-linear models, on the other hand, requires a mixing model and the intro-
duction of independent classifier training (as a rather unsatisfying solution) to
simplify the training of a single model structure, but simplifies changing this
structure and provides a clearer interpretation of the model formed by a single
classifier.
6 It is not clear if such a quality measure is indeed useful in all occasions. Booker
proposed to consider classifiers with low parameter values as bad classifiers, as “The
ones with large weights are the most important terms in the approximation” [24],
but would that also work in cases where low parameter values are actually good
parameter values? One can easily imaging a part of a function that is constantly 0
and thus requires 0 parameter values to model it.
 
Search WWH ::




Custom Search