Information Technology Reference
In-Depth Information
Rather than specifying a value for
β
k
, it is again modelled by the Gamma
hyperprior
b
a
β
β
(
a
β
−
1)
k
Γ(
a
β
)
p
(
β
k
)=Gam(
β
k
|
a
β
,b
β
)=
−
a
β
β
k
)
,
exp(
(7.14)
with hyper-parameters set to
a
β
=10
−
2
and
b
β
=10
−
4
to get a broad and
uninformative prior for the variance of the mixing weight vectors. The shape of
the prior is the same as for
τ
−
1
k
, which is shown in Fig. 7.3(a).
7.2.6
Joint Distribution over Random Variables
Assuming knowledge of
X
and
M
, the joint distribution over all random varia-
bles decomposes into
p
(
Y
,
U
|
X
)=
p
(
Y
|
X
,
W
,
τ
,
Z
)
p
(
W
,
τ
|
α
)
p
(
α
)
×
p
(
Z
|
X
,
V
)
p
(
V
|
β
)
p
(
β
)
,
(7.15)
where
U
collectively denotes the hidden variables
U
=
.This
decomposition is also clearly visible in Fig. 7.2, where the dependency structure
between the different variables and parameters is graphically illustrated. All
priors are independent for different
k
's, and so we have
{
W
,
τ
,
α
,
Z
,
V
,
β
}
K
p
(
W
,
τ
|
α
)=
p
(
W
k
,τ
k
|
α
k
)
,
(7.16)
k
=1
K
p
(
α
)=
p
(
α
k
)
,
(7.17)
k
=1
K
p
(
V
|
β
)=
p
(
v
k
|
β
k
)
,
(7.18)
k
=1
K
p
(
β
)=
p
(
β
k
)
.
(7.19)
k
=1
By inspecting (7.6) and (7.12) it can be seen that, similar to the priors, both
p
(
Y
X
,
V
)factoriseover
k
, and therefore the joint distri-
bution (7.15) factorises over
k
as well. This property will be used when deriving
the required expressions to compute the evidence
p
(
|
X
,
W
,
τ
,
Z
)and
p
(
Z
|
D|M
).
7.3
Evaluating the Model Evidence
This rather technical section is devoted to deriving an expression for the model
evidence
p
(
D|M
) for use in (7.3). Evaluating (7.2) does not yield a closed-form
Search WWH ::
Custom Search