The Optimal Set of Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Table 7.1. Bayesian

LCS model, with all its components. For more details on the

model see Sect. 7.2.

Data, Model Structure, and Likelihood

N observations { ( x n , y n ) } , x n ∈X = R D X , y n ∈Y = R D Y

Model structure M = {K, M },k =1 ,...,K

K classifiers

Matching functions M = {m k : X→ [0 , 1] }

Likelihood p ( Y | X , W , τ , Z )= Q n =1 Q k =1 p ( y n | x n , W k ,τ k ) z nk

Classifiers

Variables

Weight matrices W = { W k }, W k ∈ R D Y × R D X

Noise precisions τ = {τ k }

Weight shrinkage priors α = {α k }

Noise precision prior parameters a τ , b τ

α -hyperprior parameters a α , b α

I )= Q D Y

p ( y | x , W k ,τ k )= N ( y | W k x ,τ − 1

N ( y j | w kj x ,τ − 1

Model

)

p ( W k ,τ k |α k )= Q D Y

=1 ` N ( w kj | 0 , ( α k τ k ) − 1 I )Gam( τ k |a τ ,b τ ) ´

p ( α k )=Gam( α k |a α ,b α )

Priors

Mixing

Latent variables Z = { z n } , z n =( z n 1 ,...,z nK ) T ∈{ 0 , 1 } K , 1-of- K

Mixing weight vectors V = { v k } , v k ∈ R D V

Mixing weight shrinkage priors β = {β k }

β -hyperprior parameters a β , b β

Variables

p ( Z | X , V , M )= Q n =1 Q k =1 g k ( x n ) z nk

g k ( x ) ≡ p ( z k =1 | x , v k ,m k )=

Model

v k φ

))

P j =1 m j ( x )exp( v j φ ( x ))

m k

(

)exp(

(

p ( v k |β k )= N ( v k | 0 ,β − 1

Priors

I )

p ( β k )=Gam( β k |a β ,b β )

7.2.1

Data, Model Structure, and Likelihood

To evaluate the evidence of a certain model structure

, the data

and the

model structure

consists of N observations,

each given by an input/output pair ( x n , y n ). The input vector x n is an element

of the D X -dimensional real input space

need to be known. The data

D X , and the output vector y n is

an element of the D Y -dimensional real output space

D Y . Hence, x n has

D X ,and y n has D Y elements. The input matrix X and output matrix Y are

defined according to (3.4).

The data is assumed to be standardised by a linear transformation such that

all x and y have mean 0 and a range of 1. The purpose of this standardisation

is the same as the one given by Chipman, George and McCulloch [62], which is

to make it easier to intuitively gauge parameter values. For example, with the

data being standardised, a weight value of 2 can be considered large as a half

range increase in x would result in a full range increase in y .

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home