Information Technology Reference
In-Depth Information
when maximising with respect to U i , where the expectation is taken with respect
to all hidden variables except for U i , and the constant term is the logarithm of
the normalisation constant of q i [19, 118]. In our case, we group the variables
according to their priors by
{
W , τ
}
,
{
α
}
,
{
V
}
,
{
β
}
,
{
Z
}
.
Handling the Softmax Function
If the model has a conjugate-exponential structure, (7.24) gives an analytical
solution with a distribution form equal to the prior of the corresponding hid-
den variable. However, in our case the generalised softmax function (7.10) does
not conform to this conjugate-exponential structure, and needs to be dealt with
separately. A possible approach is to replace the softmax function by an expo-
nential lower bound on it, which consequently introduces additional variational
variables with respect to which L ( q ) also needs to be maximised. This approach
was followed By Bishop and Svensen [20] and Jaakkola and Jordan [119] for
the logistic sigmoid function, but currently there is no known exponential lower
bound function on the softmax besides a conjectured one by Gibbs [93] 4 .Alter-
natively, we can follow the approach taken by Waterhouse et al. [227, 226], where
q V ( V ) is approximated by a Laplace approximation. Due to the lack of better
alternatives, this approach is chosen, despite such an approximation invalidating
the lower bound nature of
L
( q ).
Update Equations and Model Posterior
To get the update equations for the parameters of the variational distribution,
we need to evaluate (7.24) for each group of hidden variables in U separately,
similar to the derivations by Waterhouse et al. [226] and Ueda and Ghahramani
[216]. This provides us with an approximation for the posterior p ( U
|
Y ) and will
be shown in the following sections.
Approximating the model evidence p ( Y ) requires a closed-form expression for
L
( q ) by evaluating (7.21), where many terms of the variational update equations
can be reused, as will be shown after having derived the update equations.
Classifier Model q W,τ
( W, τ )
7.3.2
The maximum of
( q ) with respect to W and τ is given by evaluating (7.24)
for q W,τ , which, by using (7.15), (7.16) and (7.6), results in
L
ln q W,τ ( W , τ )=
E Z (ln p ( Y
|
W , τ , Z )) +
E α (ln 0 p ( W , τ
|
α )) + const.
=
k
E Z ( z nk ln p ( y n | W k k ))
n
+
k
E α (ln p ( W k k |
α k )) + const. ,
(7.25)
4 A more general bound was recently developed by Wainwright, Jaakkola and Willsky
[225], but its applicability still needs to be evaluated.
 
Search WWH ::




Custom Search