Image/Video Segmentation: Current Status, Trends, and Challenges - Video Segmentation and Its Applications

Digital Signal Processing Reference

In-Depth Information

Dirichlet Allocation (LDA), which is mainly used to model text corpora based on

the bag-of-words assumption. LDA is a text model which was first introduced by

Blei [ 42 ] to cluster co-occurring words into topics with semantic meanings. Since it

enable efficient processing of large collections while preserving the essential statis-

tical relationships, LDA was not only used for text classification and summarization,

but also widely used to discover object categories from a collection of images [ 43 ].

Some entities are designed to describe the LDA model including “words”, “docu-

ments” and “corpora”. Notice that a document is a sequence of certain words, which

are the basic units of discrete data. A collection of certain documents (e.g., M )cor-

responds to a corpus. The basic idea of LDA is that documents are represented as

random mixtures over latent topics, where each topic z is characterized by a dis-

tribution over words w [ 42 ]. To borrow this algorithm from text literature, many

researchers extended LDA model to solve the computer vision problems by mapping

the quantized local descriptors (e.g., SIFT descriptors [ 44 ]) to “visual words”. Each

cluster centers after k-means clustering can be regarded as a visual word, which

is used to represent a document (e.g., an image) as a histogram of visual words,

namely the bag of words. Based on LDA graphic model shown in Fig. 1.6 , a gen-

erative process for each document in a corpus can be obtained by defining certain

distributions, such as

. The details of LDA algo-

rithm can be referred to [ 42 ]. Given the training data, the LDA model is used to

maximize the marginal distribution p

θ ∼

Dir

( α )

, z n ∼

Multinomial

( θ )

via Gibbs sampler.

Since the traditional LDA model only considers the document as a bag of

words, spatial relationships among adjacent words are ignored, which results in

low accuracy of the recognition tasks. Thus, many researchers considered improv-

ing the performance by incorporating the spatial relations into the LDA model.

For example, Cao and Fei-Fei introduced a spatially coherent latent topic model

(Spatial-LTM) that can improve the traditional bag of words representation of texts

and images [ 45 ]. In this model, an image is first partitioned into regions, which are

described by appearance feature and a set of visual words. Each region is treated as

a document. The labels of regions denote the latent topic. The Spatial-LTM model

is estimated by the variational message passing algorithm, which can simultane-

ously segment and classify objects. The similar extension of LDA model can also

be found in the Spatial Latent Dirichlet Allocation model [ 46 ], which encodes spa-

tial structure among visual words. It clusters visual words that are close in space

into one topic.

(

w

| α , β )

Fig. 1.6

LDA graphical model

Video Segmentation and Its Applications

Search WWH ::

Custom Search

Home