Topic Models - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

β k

Σ

η d

Z dn

W dn

K

N

μ

D

FIGURE 4.6 : The graphical model for the correlated topic model in Sec-

tion 4.4.1.

4.4.2 The Dynamic Topic Model

LDA and the CTM assume that words are exchangeable within each docu-

ment, i.e., their order does not affect their probability under the model. This

assumption is a simplification that it is consistent with the goal of identifying

the semantic themes within each document.

But LDA and the CTM further assume that documents are exchangeable

within the corpus, and, for many corpora, this assumption is inappropri-

ate. Scholarly journals, email, news articles, and search query logs all reflect

evolving content. For example, the Science articles “The Brain of Professor

Laborde” and “Reshaping the Cortical Motor Map by Unmasking Latent In-

tracortical Connections” may both concern aspects of neuroscience, but the

field of neuroscience looked much different in 1903 than it did in 1991. The

topics of a document collection evolve over time. In this section, we describe

how to explicitly model and uncover the dynamics of the underlying topics.

The dynamic topic model (DTM) captures the evolution of topics in a se-

quentially organized corpus of documents. In the DTM, we divide the data

by time slice, e.g., by year. We model the documents of each slice with a K -

component topic model, where the topics associated with slice t evolve from

the topics associated with slice t

1.

Again, we avail ourselves of the logistic normal distribution, this time using

it to capture uncertainty about the time-series topics. We model sequences of

simplicial random variables by chaining Gaussian distributions in a dynamic

model and mapping the emitted values to the simplex. This is an extension

of the logistic normal to time-series simplex data (39).

For a K -component model with V terms, let π t,k denote a multivariate

Gaussian random variable for topic k in slice t .

−

For each topic, we chain

{

π 1 ,k ,...,π T,k }

in a state space model that evolves with Gaussian noise:

( π t− 1 ,k ,σ 2 I ) .

π t,k |

π t− 1 ,k ∼N

(4.16)

When drawing words from these topics, we map the natural parameters back

to the simplex with the function f from Eq. (4.15). Note that the time-series

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home