The Optimal Set of Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

7.5.1

Local Classification Models and Their Priors

Taking the generative point-of-view, it is assumed that a single classifier k ge-

nerates each of the classes with a fixed probability, independent of the input.

Thus, its model is, as already introduced in Sect. 4.2.2, given by

w k )= w y kj ,

with

p ( y

w j =1 .

(7.112)

D Y is the parameter vector of that classifier, with each of its elements w kj

modelling the generative probability for its associated class j . As a consequence,

its elements have to be non-negative and sum up to 1.

The conjugate prior p ( w k ) on a classifier's parameters is given by the Dirichlet

distribution

w k ∈ R

α )= C ( α )

w α j − 1

p ( w k )=Dir( w k |

(7.113)

D Y , that is equivalent for all classifiers, due to

the lack of better knowledge. Its normalising constant C ( α )isgivenby

parametrised by the vector α

∈ R

Γ( α )

j Γ( α j )) ,

C ( α )=

(7.114)

where α denotes the sum of all elements of α ,thatis

α =

α j .

(7.115)

( w k )= α /α , and thus the elements of α allow us

to specify a prior bias towards one or the other class. Usually, nothing is known

about the class distribution for different areas of the input space, and so all

elements of α should be set to the same value.

In contrast to the relation of the different elements of α to each other, their

absolute magnitude specifies the strength of the prior, that is, how strongly the

prior affects the posterior in the light of further evidence. Intuitively speaking,

a change of 1 to an element of α represents one observation of the associated

class. Thus, to keep the prior non-informative it should be set to small positive

values, such as, for example, α =(10 − 2 ,..., 10 − 2 ) T .

Besides a different classifier model, no further modifications are required to

the Bayesian LCS model. Its hidden variables are now U =

Given this prior, we have

{

W , Z , V , β

}

,where

W =

is the set of the classifier's parameters, whose distribution factorises

with respect to k ,thatis

{

w k }

p ( W )=

p ( w k ) .

(7.116)

Assuming knowledge of X and

, the joint distribution of data and hidden

variables is given by

p ( Y , U

X )= p ( Y

X , W , Z ) p ( W ) p ( Z

X , V ) p ( V

β ) p ( β ) .

(7.117)

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home