Information Technology Reference
In-Depth Information
Rather than specifying a value for β k , it is again modelled by the Gamma
hyperprior
b a β
β ( a β 1)
k
Γ( a β )
p ( β k )=Gam( β k |
a β ,b β )=
a β β k ) ,
exp(
(7.14)
with hyper-parameters set to a β =10 2 and b β =10 4 to get a broad and
uninformative prior for the variance of the mixing weight vectors. The shape of
the prior is the same as for τ 1
k
, which is shown in Fig. 7.3(a).
7.2.6
Joint Distribution over Random Variables
Assuming knowledge of X and
M
, the joint distribution over all random varia-
bles decomposes into
p ( Y , U
|
X )= p ( Y
|
X , W , τ , Z ) p ( W , τ
|
α ) p ( α )
×
p ( Z
|
X , V ) p ( V
|
β ) p ( β ) ,
(7.15)
where U collectively denotes the hidden variables U =
.This
decomposition is also clearly visible in Fig. 7.2, where the dependency structure
between the different variables and parameters is graphically illustrated. All
priors are independent for different k 's, and so we have
{
W , τ , α , Z , V , β
}
K
p ( W , τ
|
α )=
p ( W k k |
α k ) ,
(7.16)
k =1
K
p ( α )=
p ( α k ) ,
(7.17)
k =1
K
p ( V
|
β )=
p ( v k |
β k ) ,
(7.18)
k =1
K
p ( β )=
p ( β k ) .
(7.19)
k =1
By inspecting (7.6) and (7.12) it can be seen that, similar to the priors, both
p ( Y
X , V )factoriseover k , and therefore the joint distri-
bution (7.15) factorises over k as well. This property will be used when deriving
the required expressions to compute the evidence p (
|
X , W , τ , Z )and p ( Z
|
D|M
).
7.3
Evaluating the Model Evidence
This rather technical section is devoted to deriving an expression for the model
evidence p (
D|M
) for use in (7.3). Evaluating (7.2) does not yield a closed-form
Search WWH ::




Custom Search