Digital Signal Processing Reference
In-Depth Information
Dirichlet Allocation (LDA), which is mainly used to model text corpora based on
the bag-of-words assumption. LDA is a text model which was first introduced by
Blei [ 42 ] to cluster co-occurring words into topics with semantic meanings. Since it
enable efficient processing of large collections while preserving the essential statis-
tical relationships, LDA was not only used for text classification and summarization,
but also widely used to discover object categories from a collection of images [ 43 ].
Some entities are designed to describe the LDA model including “words”, “docu-
ments” and “corpora”. Notice that a document is a sequence of certain words, which
are the basic units of discrete data. A collection of certain documents (e.g., M )cor-
responds to a corpus. The basic idea of LDA is that documents are represented as
random mixtures over latent topics, where each topic z is characterized by a dis-
tribution over words w [ 42 ]. To borrow this algorithm from text literature, many
researchers extended LDA model to solve the computer vision problems by mapping
the quantized local descriptors (e.g., SIFT descriptors [ 44 ]) to “visual words”. Each
cluster centers after k-means clustering can be regarded as a visual word, which
is used to represent a document (e.g., an image) as a histogram of visual words,
namely the bag of words. Based on LDA graphic model shown in Fig. 1.6 , a gen-
erative process for each document in a corpus can be obtained by defining certain
distributions, such as
. The details of LDA algo-
rithm can be referred to [ 42 ]. Given the training data, the LDA model is used to
maximize the marginal distribution p
θ
Dir
( α )
, z n
Multinomial
( θ )
via Gibbs sampler.
Since the traditional LDA model only considers the document as a bag of
words, spatial relationships among adjacent words are ignored, which results in
low accuracy of the recognition tasks. Thus, many researchers considered improv-
ing the performance by incorporating the spatial relations into the LDA model.
For example, Cao and Fei-Fei introduced a spatially coherent latent topic model
(Spatial-LTM) that can improve the traditional bag of words representation of texts
and images [ 45 ]. In this model, an image is first partitioned into regions, which are
described by appearance feature and a set of visual words. Each region is treated as
a document. The labels of regions denote the latent topic. The Spatial-LTM model
is estimated by the variational message passing algorithm, which can simultane-
ously segment and classify objects. The similar extension of LDA model can also
be found in the Spatial Latent Dirichlet Allocation model [ 46 ], which encodes spa-
tial structure among visual words. It clusters visual words that are close in space
into one topic.
(
w
| α , β )
Fig. 1.6
LDA graphical model
Search WWH ::




Custom Search