Database Reference
In-Depth Information
Chapter 4
Topi c Mode l s
DavidM.BleiandJohnD.Lafferty
4.1
Introduction
.............................................................
71
4.2
Latent Dirichlet Allocation
..............................................
72
4.3
Posterior Inference for LDA
.............................................
76
4.4
Dynamic Topic Models and Correlated Topic Models
..................
82
4.5
Discussion
................................................................
89
4.1 Introduction
Scientists need new tools to explore and browse large collections of scholarly
literature. Thanks to organizations such as JSTOR, which scan and index the
original bound archives of many journals, modern scientists can search digital
libraries spanning hundreds of years. A scientist, suddenly faced with access to
millions of articles in her field, is not satisfied with simple search. Effectively
using such collections requires interacting with them in a more structured
way: finding articles similar to those of interest, and exploring the collection
through the underlying topics that run through it.
The central problem is that this structure—the index of ideas contained
in the articles and which other articles are about the same kinds of ideas—
is not readily available in most modern collections, and the size and growth
rate of these collections preclude us from building it by hand. To develop the
necessary tools for exploring and browsing modern digital libraries, we require
automated methods of organizing, managing, and delivering their contents.
In this chapter, we describe topic models , probabilistic models for uncov-
ering the underlying semantic structure of a document collection based on a
hierarchical Bayesian analysis of the original texts (10; 18; 11; 20; 12). Topic
models have been applied to many kinds of documents, including email (42),
scientific abstracts (18; 10), and newspaper archives (38). By discovering pat-
terns of word use and connecting documents that exhibit similar patterns,
topic models have emerged as a powerful new technique for finding useful
structure in an otherwise unstructured collection.
 
 
 
 
 
 
Search WWH ::




Custom Search