The Optimal Set of Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

The only terms that differ from the ones evaluated in Sect. 7.3.8 are the ones

that contain W and are for the classification model given by

r nk

y nj (ψ( α kj ) − ψ( α k )) ,

E W,Z (ln p ( Y | X , W , Z )) =

(7.124)

ln C ( α )+

( α j − 1)(ψ( α kj ) − ψ( α k )) , (7.125)

E W (ln p ( W )) =

ln C ( α k )+

( α kj − 1)(ψ( α kj ) − ψ( α k )) (7.126)

E W (ln q ( W )) =

Splitting the variational bound again into

L k 's and

L M ,

L k ( q ) for classifier k is

defined as

L k ( q )=

E W,Z (ln p ( Y

X , W , Z )) +

E W (ln p ( W ))

− E W (ln q ( W )) ,

(7.127)

and evaluates to

ln C ( α k ) ,

L k ( q )=ln C ( α )

−

(7.128)

where (7.120) was used to simplify the expression.

L M ( q ) remains unchanged

and is thus given by (7.95). As before,

( q ) is given by (7.96).

7.5.4

Independent Classifier Training

As before, the classifiers can be trained independently by replacing r nk by

m k ( x n ). This only influences the classifier weight vector update (7.120) that

becomes

α k = α +

m k ( x n ) y n .

(7.129)

This change invalidates the simplifications performed to get

L k ( q ) by (7.128).

Instead,

ln C ( α k )

L k ( q )=ln C ( α )

−

( α k )

α kj

( α kj )

r nk y nj + α j −

− ψ

(7.130)

has to be used.

If classifiers are trained independently, then they can be trained in a single

pass by (7.129), as no hyperpriors are used. How the mixing model is trained

and the variational bound is evaluated remains unchanged and is described in

Sect. 7.3.10.

7.5.5

Predictive Density

Given a new observation ( y , x ), its predictive density is given by p ( y |

x ,

The density's mixing-model component is essentially the same as in Sect. 7.4.

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home