Information Technology Reference
In-Depth Information
7.3.8
The Variational Bound
L
(
q
)
We are most interested in finding the value for
(
q
) by (7.21), as it provi-
des us with an approximated lower bound on the logarithm of the model evi-
dence ln
p
(
Y
), and is the actual expression that is to be maximised. Evaluating
(7.21) by using the distribution decomposition according to (7.15), the variatio-
nal bound is given by
L
(
q
)=
q
(
U
)ln
p
(
Y
,
U
)
q
(
U
)
L
d
U
=
E
W,τ,α,Z,V,β
(ln
p
(
Y
,
W
,
τ
,
Z
,
V
,
β
))
−
E
W,τ,α,Z,V,β
(ln
q
(
W
,
τ
,
α
,
Z
,
V
,
β
))
=
E
W,τ,Z
(ln
p
(
Y
|
W
,
τ
,
Z
)) +
E
W,τ,α
(ln
p
(
W
,
τ
|
α
)) +
E
α
(ln
p
(
α
))
+
E
Z,V
(ln
p
(
Z
|
V
)) +
E
V,β
(ln
p
(
V
|
β
)) +
E
β
(ln
p
(
β
))
−
E
W,τ
(ln
q
(
W
,
τ
))
−
E
α
(ln
q
(
α
))
−
E
Z
(ln
q
(
Z
))
−
E
V
(ln
q
(
V
))
−
E
β
(ln
q
(
β
))
,
(7.75)
where all expectations are taken with respect to the variational distribution
q
.
These are evaluated one by one, using the previously derived moments of the
variational posteriors.
To derive
E
W,τ,Z
(ln
p
(
Y
|
W
,
τ
,
Z
)), we use (7.6) and (7.7) to get
E
W,τ,Z
(ln
p
(
Y
|
W
,
τ
))
=
n
E
Z
(
z
nk
)
j
w
kj
x
n
,τ
−
1
E
W,τ
(ln
N
(
y
nj
|
))
k
k
1
2
E
τ
(ln
τ
k
)
w
kj
x
n
)
2
)
=
n
r
nk
j
1
2
ln 2
π
1
2
E
W,τ
(
τ
k
(
y
nj
−
−
−
k
D
2
=
k
ln 2
π
n
ψ
(
a
τ
k
)
ln
b
τ
k
−
−
r
nk
w
kj
T
x
n
)
2
+
x
n
Λ
k
−
1
x
n
a
τ
k
b
τ
k
2
n
r
nk
j
1
−
(
y
nj
−
D
2
=
k
ln 2
π
n
ψ
(
a
τ
k
)
ln
b
τ
k
−
−
r
nk
2
+
D
Y
x
n
Λ
k
−
1
x
n
.
r
nk
a
τ
k
2
n
1
W
k
x
n
−
b
τ
k
y
n
−
(7.76)
The classifier model parameters expectation
E
W,τ,α
(ln
p
(
W
,
τ
|
α
)) can be de-
rived by using (7.7) and (7.16), and is given by
E
W,τ,α
(ln
p
(
W
,
τ
|
α
))
(7.77)
=
k
E
W,τ,α
(ln
a
τ
,b
τ
))
.
0
,
(
α
k
τ
k
)
−
1
I
)) +
N
(
w
kj
|
E
τ
(ln Gam(
τ
k
|
j
Search WWH ::
Custom Search