Digital Signal Processing Reference
In-Depth Information
3.4.1
pLSA and LDA
Sivic et al. [
50
] discovered the object classes from a set of unlabeled images and
segmented images into different object classes using pLSA and LDA. They modeled
an image as a bag of visual words and ignored any spatial relationships among
visual words. Suppose there are
M
images in the data set. Each image
j
has
N
j
visual words. Each visual word
w
ji
is assigned one of the
K
object classes according
to its label
z
ji
. Under pLSA, the joint probability
P
has the form
of the graphical model shown in Fig.
3.8
a. The conditional probability
P
(
{
w
ji
},{
d
j
},{
z
ji
}
)
(
w
ji
|
d
j
)
marginalizing over topics
z
ji
is given by
K
k
=
1
P
(
z
ji
=
k
|
d
j
)
P
(
w
ji
|
z
ji
=
k
)
.
P
(
w
ji
|
d
j
)=
(3.9)
P
)
is the probability of visual word
w
ji
occurring in object class
k
and is the model
of object class
k
. Fitting the pLSA model involves determining
P
(
z
ji
=
k
|
d
j
)
is the probability of object class
k
occurring in image
d
j
.
P
(
w
ji
|
z
ji
=
k
(
w
ji
|
z
ji
)
and
P
by maximizing the following objective function using the Expectation
Maximization (EM) algorithm:
(
z
ji
=
k
|
d
j
)
N
j
i
=
1
P
(
w
ji
|
d
j
)
.
M
j
=
1
L
=
(3.10)
Images are segmented into objects with semantic meanings based on the labels
z
ji
of visual words.
pLSA is a generative model only for training images but not for new images.
This shortcoming has been addressed by LDA, whose graphical model is shown in
Fig.
3.8
b. Under LDA,
{
φ
k
}
are models of object classes and are discrete distribu-
tions over the codebook of visual words. They are generated from a Dirichlet prior
Dir
(
φ
k
;
β
)
given by
β
. Each image
j
has a multinomial distribution
π
j
over
K
object
Fig. 3.8
Graphical models
of pLSA and LDA