Digital Signal Processing Reference
In-Depth Information
Dirichlet Allocation (LDA), which is mainly used to model text corpora based on
the bag-of-words assumption. LDA is a text model which was first introduced by
Blei [
42
] to cluster co-occurring words into topics with semantic meanings. Since it
enable efficient processing of large collections while preserving the essential statis-
tical relationships, LDA was not only used for text classification and summarization,
but also widely used to discover object categories from a collection of images [
43
].
Some entities are designed to describe the LDA model including “words”, “docu-
ments” and “corpora”. Notice that a document is a sequence of certain words, which
are the basic units of discrete data. A collection of certain documents (e.g.,
M
)cor-
responds to a corpus. The basic idea of LDA is that documents are represented as
random mixtures over latent topics, where each topic
z
is characterized by a dis-
tribution over words
w
[
42
]. To borrow this algorithm from text literature, many
researchers extended LDA model to solve the computer vision problems by mapping
the quantized local descriptors (e.g., SIFT descriptors [
44
]) to “visual words”. Each
cluster centers after k-means clustering can be regarded as a visual word, which
is used to represent a document (e.g., an image) as a histogram of visual words,
namely the bag of words. Based on LDA graphic model shown in Fig.
1.6
, a gen-
erative process for each document in a corpus can be obtained by defining certain
distributions, such as
. The details of LDA algo-
rithm can be referred to [
42
]. Given the training data, the LDA model is used to
maximize the marginal distribution
p
θ
∼
Dir
(
α
)
,
z
n
∼
Multinomial
(
θ
)
via Gibbs sampler.
Since the traditional LDA model only considers the document as a bag of
words, spatial relationships among adjacent words are ignored, which results in
low accuracy of the recognition tasks. Thus, many researchers considered improv-
ing the performance by incorporating the spatial relations into the LDA model.
For example, Cao and Fei-Fei introduced a spatially coherent latent topic model
(Spatial-LTM) that can improve the traditional bag of words representation of texts
and images [
45
]. In this model, an image is first partitioned into regions, which are
described by appearance feature and a set of visual words. Each region is treated as
a document. The labels of regions denote the latent topic. The Spatial-LTM model
is estimated by the variational message passing algorithm, which can simultane-
ously segment and classify objects. The similar extension of LDA model can also
be found in the Spatial Latent Dirichlet Allocation model [
46
], which encodes spa-
tial structure among visual words. It clusters visual words that are close in space
into one topic.
(
w
|
α
,
β
)
Fig. 1.6
LDA graphical model