Topic Models - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

Specifically, the objective function is

E[log p ( β k |

E[log p ( θ d |

θ d )]

η )] +

α )] +

E[log p ( Z d,n |

k =1

d =1

n =1

Z d,n , β 1: K )] + H( q ) ,

E[log p ( w d,n |

d =1

n =1

(4.6)

where H denotes the entropy and all expectations are taken with respect to

the variational distribution in Eq. (4.5). See ( 10 ) for details on how to com-

pute this function. Optimization proceeds by coordinate ascent, iteratively

optimizing each variational parameter to increase the objective.

Mean field variational inference for LDA is discussed in detail in (10), and

good introductions to variational methods include (21) and (37). Here, we

will focus on the variational inference algorithm for the LDA model and try

to provide more intuition for how it learns topics from otherwise unstructured

text.

One iteration of the mean field variational inference algorithm performs

the coordinate ascent updates in Figure 4.5 , and these updates are repeated

until the objective function converges. Each update has a close relationship

to the true posterior of each hidden random variable conditioned on the other

hidden and observed random variables.

Consider the variational Dirichlet parameter for the k th topic. The true

posterior Dirichlet parameter for a term given all of the topic assignments and

words is a Dirichlet with parameters η + n k,w ,where n k,w denotes the number

of times word w is assigned to topic k . (This follows from the conjugacy of the

Dirichlet and multinomial. See ( 17 ) for a good introduction to this concept.)

The update in Eq. (4.8) is nearly this expression, but with n k,w replaced

by its expectation under the variational distribution. The independence of

the hidden variables in the variational distribution guarantees that such an

expectation will not depend on the parameter being updated. The variational

update for the topic proportions in Eq. (4.9) is analogous.

The variational update for the distribution of z d,n follows a similar formula.

Consider the true posterior of z d,n , given the other relevant hidden variables

and observed word w d,n ,

θ d ,w d,n , β 1: K )

p ( z d,n = k

∝

exp

{

log θ d,k +log β k,w d,n }

(4.7)

The update in Eq. (4.10) is this distribution, with the term inside the exponent

replaced by its expectation under the variational distribution. Note that under

the variational Dirichlet distribution, E[log β k,w ]=Ψ( λ k,w )

Ψ( v λ k,v ), and

−

E[log θ d,k ] is similarly computed.

This general approach to mean-field variational methods—update each vari-

ational parameter with the parameter given by the expectation of the true

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home