Database Reference
In-Depth Information
variational inference (10), collapsed variational inference (36), expectation
propagation (26), and Gibbs sampling (33). Each has advantages and disad-
vantages: choosing an approximate inference algorithm amounts to trading off
speed, complexity, accuracy, and conceptual simplicity. A thorough compari-
son of these techniques is not our goal here; we use the mean field variational
approach throughout this chapter.
4.3.1 Mean Field Variational Inference
The basic idea behind variational inference is to approximate an intractable
posterior distribution over hidden variables, such as Eq. (4.2), with a simpler
distribution containing free variational parameters . These parameters are
then fit so that the approximation is close to the true posterior.
The LDA posterior is intractable to compute exactly because the hidden
variables (i.e., the components of the hidden topic structure) are dependent
when conditioned on data. Specifically, this dependence yields diculty in
computing the denominator in Eq. (4.2) because one must sum over all con-
figurations of the interdependent N topic assignment variables z 1: N .
In contrast to the true posterior, the mean field variational distribution
for LDA is one where the variables are independent of each other, with each
governed by a different variational parameter:
q ( θ dd |
φ d,n )
K
D
N
q ( θ 1: D ,z 1: D, 1: N , β 1: K )=
q ( β k |
λ k )
γ d )
q ( z d,n |
k =1
d =1
n =1
(4.5)
Each hidden variable is described by a distribution over its type: the topics
β 1: K are each described by a V -Dirichlet distribution λ k ; the topic propor-
tions θ 1: D are each described by a K -Dirichlet distribution γ d ;andthetopic
assignment z d,n is described by a K -multinomial distribution φ d,n .Weem-
phasize that in the variational distribution these variables are independent;
in the true posterior they are coupled through the observed documents.
With the variational distribution in hand, we fit its variational parameters
to minimize the Kullback-Leibler (KL) to the true posterior:
KL( q ( θ 1: D ,z 1: D, 1: N , β 1: K )
p ( θ 1: D ,z 1: D, 1: N , β 1: K |
arg
min
γ 1: D 1: K 1: D, 1: N
||
w 1: D, 1: N ))
The objective cannot be computed exactly, but it can be computed up to a
constant that does not depend on the variational parameters. (In fact, this
constant is the log likelihood of the data under the model.)
Search WWH ::




Custom Search