Information Technology Reference
In-Depth Information
where
H
is the Hessian matrix of
E
(
V
) as used in the IRLS algorithm. Overall,
the Laplace approximation to the posterior
q
V
(
V
) is given by the multivariate
Gaussian
V
∗
,
Λ
V
−
1
)
,
(7.51)
where
V
∗
is the solution to (7.47), and
Λ
V
is the Hessian matrix evaluated at
V
∗
.
q
V
(
V
)
≈N
(
V
|
Mixing Weight Priors
q
β
7.3.5
(
β
)
By (7.19),
p
(
β
) factorises with respect to
k
, and thus allows us to find
q
β
(
β
)
for each classifier separately, which, by (7.15), (7.18) and (7.24), requires the
evaluation of
ln
q
β
(
β
k
)=
E
V
(ln
p
(
v
k
|
β
k
)) + ln
p
(
β
k
)
.
(7.52)
Using (7.13) and (7.14), the expectation and log-density are given by
D
V
2
β
k
2
E
V
(
v
k
v
k
)+const.
,
E
V
(ln
p
(
v
k
|
β
k
)) =
ln
β
k
−
(7.53)
ln
p
(
β
k
)=(
a
β
−
1) ln
β
k
−
β
k
b
β
+ const.
(7.54)
Combining the above, we get the variational posterior
ln
q
β
(
β
k
)=
a
β
−
ln
β
k
−
b
β
+
1
2
E
V
(
v
k
v
k
)
b
β
+const.
1+
D
V
2
a
β
k
,b
β
k
)
,
=lnGam(
β
k
|
(7.55)
with the distribution parameters
=
a
β
+
D
V
2
a
β
k
,
(7.56)
=
b
β
+
1
b
β
k
2
E
V
(
v
k
v
k
)
.
(7.57)
As the priors on
v
k
are similar to the ones on
w
k
, they cause the same effect:
as
b
β
k
increases proportionally to the expected size
2
, the expectation of the
v
k
E
β
(
β
k
)=
a
β
k
/b
β
k
decreases in proportion to it. This expectation de-
termines the shrinkage on
v
k
(see (7.47)), and thus, the strength of the shrinkage
prior is reduced if
v
k
is expected to have large elements, which is an intuitively
sensible procedure.
posterior
Latent Variables
q
Z
(
Z
)
7.3.6
To get the variational posterior over the latent variables
Z
we need to evaluate
(7.24) by the use of (7.15), that is,
ln
q
Z
(
Z
)=
E
W,τ
(ln
p
(
Y
|
W
,
τ
,
Z
)) +
E
V
(ln
p
(
Z
|
V
)) + const.
(7.58)
Search WWH ::
Custom Search