Semantic Object Segmentation - Video Segmentation and Its Applications

Digital Signal Processing Reference

In-Depth Information

Tabl e 3. 1 Confusion table of using pLSA for image classification on a data set of five

object categories from the Caltech 101 database [ 55 ]. Class number is equal to 7 in

pLSA. Three classes correspond to the background. The result was reported in [ 50 ]

True class

→

Faces

Motorbikes

Airplanes

Cars

Background

Class 1 - Faces

94.02

0.00

0.38

0.00

1.00

Class 2 - Motorbikes

0.00

83.62

0.12

0.00

1.25

Class 3 - Airplanes

0.00

0.50

95.25

0.52

0.50

Class 4 - Cars

0.46

0.88

0.38

98.1

3.75

Class 5 - Background I

1.84

0.38

0.88

0.26

41.75

Class 6 - Background II

3.68

12.88

0.88

0.00

23.00

Class 7 - Background III

0.00

1.75

2.12

1.13

28.75

classes and it is generated from a Dirichlet prior Dir

. Each patch i on image j

is assigned to one of the K object classes and its label z ji is sampled from a discrete

distribution Discrete

( π

j ;

α )

π j . The observed visual word w ji is sampled

from the model of its object class: Discrete

(

z ji ;

π j )

given by

(

w ji | φ z ji )

and

are hyperparameters.

φ k ,

π j and z ji are hidden variables to be inferred. The inference can by implemented

by variational methods [ 10 ] or collapsed Gibbs sampling [ 54 ]. Under LDA, if two

visual words often co-occur in the same images, one of the object class models

will have large distributions on both of them. pLSA and LDA perform similarly on

image classification and object segmentation and their results were promising espe-

cially when each image only contained one object. As reported by [ 50 ], on a data set

consisting of 4

090 images of five categories from the Caltech 101 database [ 55 ], the

image classification accuracy achieved by pLSA was 92

5% (see Table 3.1 ) and its

object segmentation accuracy was 49%. Both pLSA and LDA requires the number

of object classes to be known in advance. As an extension, Hierarchical Dirichlet

Process (HDP) proposed by Teh et al. [ 54 ] could automatically learn the number of

object classes from data using Dirichlet Processes [ 56 ]aspriors.

3.4.2

SLDA

A shortcoming of using pLSA and LDA to segment objects is to treat an image as

a document of visual words ignoring the spatial structure among visual words. The

assumption that if two types of patches are from the same object class, they often

appear in the same images is not strong enough. As an example shown in Fig. 3.9 ,

although the sky is far from the vehicles, if they often exist in the same images in

the data set, they would be clustered into the same topic (object class) by pLSA

or LDA. Since most parts of this image are sky and building, an image patch on

a vehicle is likely to be labeled as building or sky as well. Such problems can be

solved if the document of an image patch, such as the yellow patch in Fig. 3.9 , only

includes patches falling within its neighborhood, marked by the red dashed window

in Fig. 3.9 instead of the whole image.

Video Segmentation and Its Applications

Search WWH ::

Custom Search

Home