Digital Signal Processing Reference
In-Depth Information
3.4.1
pLSA and LDA
Sivic et al. [ 50 ] discovered the object classes from a set of unlabeled images and
segmented images into different object classes using pLSA and LDA. They modeled
an image as a bag of visual words and ignored any spatial relationships among
visual words. Suppose there are M images in the data set. Each image j has N j
visual words. Each visual word w ji is assigned one of the K object classes according
to its label z ji . Under pLSA, the joint probability P
has the form
of the graphical model shown in Fig. 3.8 a. The conditional probability P
( {
w ji },{
d j },{
z ji } )
(
w ji |
d j )
marginalizing over topics z ji is given by
K
k = 1 P ( z ji = k | d j ) P ( w ji | z ji = k ) .
P
(
w ji |
d j )=
(3.9)
P
)
is the probability of visual word w ji occurring in object class k and is the model
of object class k . Fitting the pLSA model involves determining P
(
z ji =
k
|
d j )
is the probability of object class k occurring in image d j . P
(
w ji |
z ji =
k
(
w ji |
z ji )
and
P
by maximizing the following objective function using the Expectation
Maximization (EM) algorithm:
(
z ji =
k
|
d j )
N j
i = 1 P ( w ji | d j ) .
M
j = 1
L
=
(3.10)
Images are segmented into objects with semantic meanings based on the labels z ji
of visual words.
pLSA is a generative model only for training images but not for new images.
This shortcoming has been addressed by LDA, whose graphical model is shown in
Fig. 3.8 b. Under LDA,
{ φ k }
are models of object classes and are discrete distribu-
tions over the codebook of visual words. They are generated from a Dirichlet prior
Dir
( φ k ;
β )
given by
β
. Each image j has a multinomial distribution
π j over K object
Fig. 3.8 Graphical models
of pLSA and LDA
Search WWH ::




Custom Search