Database Reference
In-Depth Information
Specifically, the objective function is
K
D
D
N
E[log p ( β k |
E[log p ( θ d |
θ d )]
L
=
η )] +
α )] +
E[log p ( Z d,n |
k =1
d =1
d =1
n =1
D
N
Z d,n , β 1: K )] + H( q ) ,
+
E[log p ( w d,n |
d =1
n =1
(4.6)
where H denotes the entropy and all expectations are taken with respect to
the variational distribution in Eq. (4.5). See ( 10 ) for details on how to com-
pute this function. Optimization proceeds by coordinate ascent, iteratively
optimizing each variational parameter to increase the objective.
Mean field variational inference for LDA is discussed in detail in (10), and
good introductions to variational methods include (21) and (37). Here, we
will focus on the variational inference algorithm for the LDA model and try
to provide more intuition for how it learns topics from otherwise unstructured
text.
One iteration of the mean field variational inference algorithm performs
the coordinate ascent updates in Figure 4.5 , and these updates are repeated
until the objective function converges. Each update has a close relationship
to the true posterior of each hidden random variable conditioned on the other
hidden and observed random variables.
Consider the variational Dirichlet parameter for the k th topic. The true
posterior Dirichlet parameter for a term given all of the topic assignments and
words is a Dirichlet with parameters η + n k,w ,where n k,w denotes the number
of times word w is assigned to topic k . (This follows from the conjugacy of the
Dirichlet and multinomial. See ( 17 ) for a good introduction to this concept.)
The update in Eq. (4.8) is nearly this expression, but with n k,w replaced
by its expectation under the variational distribution. The independence of
the hidden variables in the variational distribution guarantees that such an
expectation will not depend on the parameter being updated. The variational
update for the topic proportions in Eq. (4.9) is analogous.
The variational update for the distribution of z d,n follows a similar formula.
Consider the true posterior of z d,n , given the other relevant hidden variables
and observed word w d,n ,
θ d ,w d,n , β 1: K )
p ( z d,n = k
|
exp
{
log θ d,k +log β k,w d,n }
.
(4.7)
The update in Eq. (4.10) is this distribution, with the term inside the exponent
replaced by its expectation under the variational distribution. Note that under
the variational Dirichlet distribution, E[log β k,w ]=Ψ( λ k,w )
Ψ( v λ k,v ), and
E[log θ d,k ] is similarly computed.
This general approach to mean-field variational methods—update each vari-
ational parameter with the parameter given by the expectation of the true
Search WWH ::




Custom Search