Digital Signal Processing Reference
In-Depth Information
Tabl e 3. 1 Confusion table of using pLSA for image classification on a data set of five
object categories from the Caltech 101 database [ 55 ]. Class number is equal to 7 in
pLSA. Three classes correspond to the background. The result was reported in [ 50 ]
True class
Faces
Motorbikes
Airplanes
Cars
Background
Class 1 - Faces
94.02
0.00
0.38
0.00
1.00
Class 2 - Motorbikes
0.00
83.62
0.12
0.00
1.25
Class 3 - Airplanes
0.00
0.50
95.25
0.52
0.50
Class 4 - Cars
0.46
0.88
0.38
98.1
3.75
Class 5 - Background I
1.84
0.38
0.88
0.26
41.75
Class 6 - Background II
3.68
12.88
0.88
0.00
23.00
Class 7 - Background III
0.00
1.75
2.12
1.13
28.75
classes and it is generated from a Dirichlet prior Dir
. Each patch i on image j
is assigned to one of the K object classes and its label z ji is sampled from a discrete
distribution Discrete
( π
j ;
α )
π j . The observed visual word w ji is sampled
from the model of its object class: Discrete
(
z ji ;
π j )
given by
(
w ji | φ z ji )
.
α
and
β
are hyperparameters.
φ k ,
π j and z ji are hidden variables to be inferred. The inference can by implemented
by variational methods [ 10 ] or collapsed Gibbs sampling [ 54 ]. Under LDA, if two
visual words often co-occur in the same images, one of the object class models
will have large distributions on both of them. pLSA and LDA perform similarly on
image classification and object segmentation and their results were promising espe-
cially when each image only contained one object. As reported by [ 50 ], on a data set
consisting of 4
090 images of five categories from the Caltech 101 database [ 55 ], the
image classification accuracy achieved by pLSA was 92
,
5% (see Table 3.1 ) and its
object segmentation accuracy was 49%. Both pLSA and LDA requires the number
of object classes to be known in advance. As an extension, Hierarchical Dirichlet
Process (HDP) proposed by Teh et al. [ 54 ] could automatically learn the number of
object classes from data using Dirichlet Processes [ 56 ]aspriors.
.
3.4.2
SLDA
A shortcoming of using pLSA and LDA to segment objects is to treat an image as
a document of visual words ignoring the spatial structure among visual words. The
assumption that if two types of patches are from the same object class, they often
appear in the same images is not strong enough. As an example shown in Fig. 3.9 ,
although the sky is far from the vehicles, if they often exist in the same images in
the data set, they would be clustered into the same topic (object class) by pLSA
or LDA. Since most parts of this image are sky and building, an image patch on
a vehicle is likely to be labeled as building or sky as well. Such problems can be
solved if the document of an image patch, such as the yellow patch in Fig. 3.9 , only
includes patches falling within its neighborhood, marked by the red dashed window
in Fig. 3.9 instead of the whole image.
 
Search WWH ::




Custom Search