Information Technology Reference
In-Depth Information
6.2.4
Maximum Prediction Confidence
The global model density is by (4.8) given by a mixture of the densities of
the local models. As for the local models, the spread of the global prediction
determines a confidence interval on the global model. Minimising the spread
of the global prediction maximises its confidence. Due to mixing by weighted
average, the spread of the global density if bounded from below and above by
the smallest and the largest spread of the contributing classifiers. Thus, in order
to minimise the spread of the global prediction, we only consider the predictive
density of the classifier with the smallest predictive spread.
Using this concept, mixing to maximise the prediction confidence is formalised
by setting γ k ( x ) to 1 only for the classifier with the lowest prediction spread,
that is,
1if k =argmax k m k ( x ) τ k ( x T Λ k x +1) 1 / 2 ,
0 rwi .
γ k ( x )=
(6.29)
Note the addition of m k ( x ) to ensure that the matching highest confidence
classifier is picked.
As for mixing by confidence, using only the classifier with the highest pre-
diction confidence relies on several assumptions that might by violated. Thus,
maximum confidence mixing can be expected to perform worse than mixing by
inverse variance in cases where these assumptions are violated. In such cases it
might even fare worse than mixing by confidence, as it relies on these assumpti-
ons more heavily.
6.2.5
XCS
While none of the approaches discussed before are currently used in any LCS,
the mixing model used XCS(F) is here - for the sake of comparison - described
in the same formal framework. Mixing in XCS(F) has not changed since it was
firstly specified in [237], despite its multiple other changes and improvements.
Additionally, the mixing model in XCS(F) is closely linked to the fitness of a
classifier as used by the genetic algorithm, and is thus overly complex. Due to
the algorithmic description of an incremental method, the aims of XCS(F) are
usually not explicitly specified. Nonetheless, all mixing parameters in XCS(F)
are updated by the LMS method, for which the formally equivalent, but more
intuitive, batch approaches have already been discussed in the previous chapter.
Recall, that the LMS algorithm for single-dimensional constant inputs is spe-
cified by (5.25) to update some scalar estimate w of an output y after observing
the ( N + 1)th output by
w N +1 = w N + γ N +1 ( y N +1
w N ) ,
(6.30)
where γ N +1 is some scalar step size. As shown in Example 5.2, this update
equation aims at minimising a sum of squared errors (5.5), whose minimum is
achieved by
Search WWH ::




Custom Search