Semantic Object Segmentation - Video Segmentation and Its Applications

Digital Signal Processing Reference

In-Depth Information

3.4.1

pLSA and LDA

Sivic et al. [ 50 ] discovered the object classes from a set of unlabeled images and

segmented images into different object classes using pLSA and LDA. They modeled

an image as a bag of visual words and ignored any spatial relationships among

visual words. Suppose there are M images in the data set. Each image j has N j

visual words. Each visual word w ji is assigned one of the K object classes according

to its label z ji . Under pLSA, the joint probability P

has the form

of the graphical model shown in Fig. 3.8 a. The conditional probability P

( {

w ji },{

d j },{

z ji } )

(

w ji |

d j )

marginalizing over topics z ji is given by

K

k = 1 P ( z ji = k | d j ) P ( w ji | z ji = k ) .

P

(

w ji |

d j )=

(3.9)

P

)

is the probability of visual word w ji occurring in object class k and is the model

of object class k . Fitting the pLSA model involves determining P

(

z ji =

k

|

d j )

is the probability of object class k occurring in image d j . P

(

w ji |

z ji =

k

(

w ji |

z ji )

and

P

by maximizing the following objective function using the Expectation

Maximization (EM) algorithm:

(

z ji =

k

|

d j )

N j

i = 1 P ( w ji | d j ) .

M

j = 1

L

=

(3.10)

Images are segmented into objects with semantic meanings based on the labels z ji

of visual words.

pLSA is a generative model only for training images but not for new images.

This shortcoming has been addressed by LDA, whose graphical model is shown in

Fig. 3.8 b. Under LDA,

{ φ k }

are models of object classes and are discrete distribu-

tions over the codebook of visual words. They are generated from a Dirichlet prior

Dir

( φ k ;

β )

given by

β

. Each image j has a multinomial distribution

π j over K object

Fig. 3.8 Graphical models

of pLSA and LDA

Video Segmentation and Its Applications

Search WWH ::

Custom Search

Home