Topic Models - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

variational inference (10), collapsed variational inference (36), expectation

propagation (26), and Gibbs sampling (33). Each has advantages and disad-

vantages: choosing an approximate inference algorithm amounts to trading off

speed, complexity, accuracy, and conceptual simplicity. A thorough compari-

son of these techniques is not our goal here; we use the mean field variational

approach throughout this chapter.

4.3.1 Mean Field Variational Inference

The basic idea behind variational inference is to approximate an intractable

posterior distribution over hidden variables, such as Eq. (4.2), with a simpler

distribution containing free variational parameters . These parameters are

then fit so that the approximation is close to the true posterior.

The LDA posterior is intractable to compute exactly because the hidden

variables (i.e., the components of the hidden topic structure) are dependent

when conditioned on data. Specifically, this dependence yields diculty in

computing the denominator in Eq. (4.2) because one must sum over all con-

figurations of the interdependent N topic assignment variables z 1: N .

In contrast to the true posterior, the mean field variational distribution

for LDA is one where the variables are independent of each other, with each

governed by a different variational parameter:

q ( θ dd |

φ d,n )

K

D

N

q ( θ 1: D ,z 1: D, 1: N , β 1: K )=

q ( β k |

λ k )

γ d )

q ( z d,n |

k =1

d =1

n =1

(4.5)

Each hidden variable is described by a distribution over its type: the topics

β 1: K are each described by a V -Dirichlet distribution λ k ; the topic propor-

tions θ 1: D are each described by a K -Dirichlet distribution γ d ;andthetopic

assignment z d,n is described by a K -multinomial distribution φ d,n .Weem-

phasize that in the variational distribution these variables are independent;

in the true posterior they are coupled through the observed documents.

With the variational distribution in hand, we fit its variational parameters

to minimize the Kullback-Leibler (KL) to the true posterior:

KL( q ( θ 1: D ,z 1: D, 1: N , β 1: K )

p ( θ 1: D ,z 1: D, 1: N , β 1: K |

arg

min

γ 1: D ,λ 1: K ,φ 1: D, 1: N

||

w 1: D, 1: N ))

The objective cannot be computed exactly, but it can be computed up to a

constant that does not depend on the variational parameters. (In fact, this

constant is the log likelihood of the data under the model.)

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home