Mixing Independently Trained Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

6.2.4

Maximum Prediction Confidence

The global model density is by (4.8) given by a mixture of the densities of

the local models. As for the local models, the spread of the global prediction

determines a confidence interval on the global model. Minimising the spread

of the global prediction maximises its confidence. Due to mixing by weighted

average, the spread of the global density if bounded from below and above by

the smallest and the largest spread of the contributing classifiers. Thus, in order

to minimise the spread of the global prediction, we only consider the predictive

density of the classifier with the smallest predictive spread.

Using this concept, mixing to maximise the prediction confidence is formalised

by setting γ k ( x ) to 1 only for the classifier with the lowest prediction spread,

that is,

⎧

⎨

1if k =argmax k m k ( x ) τ − k ( x T Λ − k x +1) − 1 / 2 ,

0 rwi .

γ k ( x )=

(6.29)

⎩

Note the addition of m k ( x ) to ensure that the matching highest confidence

classifier is picked.

As for mixing by confidence, using only the classifier with the highest pre-

diction confidence relies on several assumptions that might by violated. Thus,

maximum confidence mixing can be expected to perform worse than mixing by

inverse variance in cases where these assumptions are violated. In such cases it

might even fare worse than mixing by confidence, as it relies on these assumpti-

ons more heavily.

6.2.5

XCS

While none of the approaches discussed before are currently used in any LCS,

the mixing model used XCS(F) is here - for the sake of comparison - described

in the same formal framework. Mixing in XCS(F) has not changed since it was

firstly specified in [237], despite its multiple other changes and improvements.

Additionally, the mixing model in XCS(F) is closely linked to the fitness of a

classifier as used by the genetic algorithm, and is thus overly complex. Due to

the algorithmic description of an incremental method, the aims of XCS(F) are

usually not explicitly specified. Nonetheless, all mixing parameters in XCS(F)

are updated by the LMS method, for which the formally equivalent, but more

intuitive, batch approaches have already been discussed in the previous chapter.

Recall, that the LMS algorithm for single-dimensional constant inputs is spe-

cified by (5.25) to update some scalar estimate w of an output y after observing

the ( N + 1)th output by

w N +1 = w N + γ N +1 ( y N +1 −

w N ) ,

(6.30)

where γ N +1 is some scalar step size. As shown in Example 5.2, this update

equation aims at minimising a sum of squared errors (5.5), whose minimum is

achieved by

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home