Information Technology Reference
In-Depth Information
Table 7.1. Bayesian
LCS model, with all its components. For more details on the
model see Sect. 7.2.
Data, Model Structure, and Likelihood
N observations { ( x n , y n ) } , x n ∈X = R D X , y n ∈Y = R D Y
Model structure M = {K, M },k =1 ,...,K
K classifiers
Matching functions M = {m k : X→ [0 , 1] }
Likelihood p ( Y | X , W , τ , Z )= Q n =1 Q k =1 p ( y n | x n , W k k ) z nk
Classifiers
Variables
Weight matrices W = { W k }, W k ∈ R D Y × R D X
Noise precisions τ = k }
Weight shrinkage priors α = k }
Noise precision prior parameters a τ , b τ
α -hyperprior parameters a α , b α
I )= Q D Y
j
p ( y | x , W k k )= N ( y | W k x 1
k
N ( y j | w kj x 1
Model
)
=1
k
p ( W k k k )= Q D Y
j
=1 ` N ( w kj | 0 , ( α k τ k ) 1 I )Gam( τ k |a τ ,b τ ) ´
p ( α k )=Gam( α k |a α ,b α )
Priors
Mixing
Latent variables Z = { z n } , z n =( z n 1 ,...,z nK ) T ∈{ 0 , 1 } K , 1-of- K
Mixing weight vectors V = { v k } , v k ∈ R D V
Mixing weight shrinkage priors β = k }
β -hyperprior parameters a β , b β
Variables
p ( Z | X , V , M )= Q n =1 Q k =1 g k ( x n ) z nk
g k ( x ) ≡ p ( z k =1 | x , v k ,m k )=
Model
v k φ
))
P j =1 m j ( x )exp( v j φ ( x ))
m k
(
x
)exp(
(
x
p ( v k k )= N ( v k | 0 1
k
Priors
I )
p ( β k )=Gam( β k |a β ,b β )
7.2.1
Data, Model Structure, and Likelihood
To evaluate the evidence of a certain model structure
M
, the data
D
and the
model structure
consists of N observations,
each given by an input/output pair ( x n , y n ). The input vector x n is an element
of the D X -dimensional real input space
M
need to be known. The data
D
D X , and the output vector y n is
an element of the D Y -dimensional real output space
X
=
R
D Y . Hence, x n has
D X ,and y n has D Y elements. The input matrix X and output matrix Y are
defined according to (3.4).
The data is assumed to be standardised by a linear transformation such that
all x and y have mean 0 and a range of 1. The purpose of this standardisation
is the same as the one given by Chipman, George and McCulloch [62], which is
to make it easier to intuitively gauge parameter values. For example, with the
data being standardised, a weight value of 2 can be considered large as a half
range increase in x would result in a full range increase in y .
Y
=
R
Search WWH ::




Custom Search