Database Reference
In-Depth Information
Query
Automatic Analysis, Theme Generation, and Summarization
of Machine-Readable Texts (1994)
1
Global Text Matching for Information Retrieval (1991)
2
Automatic Text Analysis (1970)
3
Language-Independent Categorization of Text (1995)
4
Developments in Automatic Text Retrieval (1991)
5
Simple and Rapid Method for the Coding of Punched Cards (1962)
6
Data Processing by Optical Coincidence (1961)
7
Pattern-Analyzing Memory (1976)
8
The Storing of Pamphlets (1899)
9
A Punched-Card Technique for Computing Means (1946)
10
Database Systems (1982)
FIGURE 4.10 : The top ten most similar articles to the query in Science
(1880-2002), scored by Eq. (4.4) using the posterior distribution from the
dynamic topic model.
4.5 Discussion
We have described and discussed latent Dirichlet allocation and its applica-
tion to decomposing and exploring a large collection of documents. We have
also described two extensions: one allowing correlated occurrence of topics
and one allowing topics to evolve through time. We have seen how topic
modeling can provide a useful view of a large collection in terms of the collec-
tion as a whole, the individual documents, and the relationships between the
documents.
There are several advantages of the generative probabilistic approach to
topic modeling, as opposed to a non-probabilistic method like LSI (12) or
non-negative matrix factorization (23). First, generative models are easily
applied to new data. This is essential for applications to tasks like information
retrieval or classification. Second, generative models are modular ;theycan
easily be used as a component in more complicated topic models. For example,
LDA has been used in models of authorship (42), syntax (19), and meeting
discourse (29). Finally, generative models are general in the sense that the
observation emission probabilities need not be discrete. Instead of words,
LDA-like models have been used to analyze images (15; 32; 6; 4), population
genetics data (28), survey data (13), and social networks data (1).
We conclude with a word of caution. The topics and topical decomposition
found with LDA and other topic models are not “definitive.” Fitting a topic
model to a collection will yield patterns within the corpus whether or not they
are “naturally” there. (And starting the procedure from a different place will
yield different patterns!)
 
Search WWH ::




Custom Search