Database Reference
In-Depth Information
contractual
employment
female
markets
criminal
expectation
industrial
men
earnings
discretion
gain
local
women
investors
justice
promises
jobs
see
sec
civil
expectations
employees
sexual
research
process
breach
relations
note
structure
federal
enforcing
unfair
employer
managers
see
supra
agreement
discrimination
firm
ocer
note
economic
harassment
risk
parole
perform
case
gender
large
inmates
FIGURE 4.3 : Five topics from a 50-topic model fit to the Yale Law Journal
from 1980-2003.
4.2.2 Exploring a Corpus with the Posterior Distribution
LDA provides a joint distribution over the observed and hidden random
variables. The hidden topic decomposition of a particular corpus arises from
the corresponding posterior distribution of the hidden variables given the D
observed documents w 1: D ,
p ( θ 1: D ,z 1: D, 1: N , β 1: K | w 1: D, 1: N ,α,η )=
(4.2)
p ( θ 1: D ,z 1: D , β 1: K |
w 1: D ,α,η )
β 1: K θ 1: D z p ( θ 1: D ,z 1: D , β 1: K |
w 1: D ,α,η ) .
Loosely, this posterior can be thought of as the “reversal” of the generative
process described above. Given the observed corpus, the posterior is a distri-
bution of the hidden variables which generated it.
As discussed in (10), this distribution is intractable to compute because of
the integral in the denominator. Before discussing approximation methods,
however, we illustrate how the posterior distribution gives a decomposition of
the corpus that can be used to better understand and organize its contents.
The quantities needed for exploring a corpus are the posterior expectations
of the hidden variables. These are the topic probability of a term β k,v =
E[ β k,v |
w 1: D, 1: N ], the topic proportions of a document θ d,k =E[ θ d,k |
w 1: D, 1: N ],
w 1: D, 1: N ]. Note that
each of these quantities is conditioned on the observed corpus.
z d,n,k =E[ Z d,n = k
|
and the topic assignment of a word
Visualizing a topic. Exploring a corpus through a topic model typically
begins with visualizing the posterior topics through their per-topic term prob-
abilities β . The simplest way to visualize a topic is to order the terms by their
probability. However, we prefer the following score,
β k,v
j =1 β j,v K
term-score k,v = β k,v log
.
(4.3)
Search WWH ::




Custom Search