Semantic Object Segmentation - Video Segmentation and Its Applications

Digital Signal Processing Reference

In-Depth Information

boosting to select from a dictionary connectivity templates, which were derived

from labeled segmentations. It exploited the contextual correlations between ob-

ject classes. Rabinovich et al. [ 46 ] explicitly defined the interactions between object

classes as semantic context and incorporated it into CRF. The semantic context was

modeled as the co-occurrence of object labels and was learned both from the train-

ing data and Google Sets. 2

Quattoni et al. [ 47 ] used CRF for part-based object recognition and detection.

CRF was used to model the spatial arranges of object parts. Ma and Grimson [ 48 ]

proposed a coupled CRF to decompose the images into contour and texture and to

model their interaction. The decomposed low-level cues were adaptively combined

for object recognition and different discriminative cues for different object classes

were fully leveraged. Reynolds and Murphy [ 49 ] proposed a tree-structured CRF

for object segmentation.

3.4

Object Segmentation Using Topic Models

The discriminative approaches described above required training data to be labeled

at pixel-level. If there are a large number of object classes to be modeled, the la-

beling work is very expensive. Some researchers started to explore approaches of

learning the models of object classes from a collection of images or videos with-

out supervision or with weak supervision (such as using training data labeled at

image-level). Inspired by the success of topic models, such as Probabilistic Latent

Semantic Analysis (pLSA) [ 9 ] and Latent Dirichlet Allocation (LDA) [ 10 ], in the

applications of language processing, they have been also applied to semantic object

segmentation in recent years. Under pLSA or LDA, words, such as “professor” and

“university”, often co-existing in the same documents are clustered into the same

topic, such as “education”. The models of topics are automatically without supervi-

sion. The word-document analysis has been applied to object segmentation through

mapping the concepts of “words” and “documents” to the image and video domains.

For example, if images are treated as documents and visual words (or textons) are

treated as words, with the assumption that visual words of the same object classes

often co-exist in the same images, the models of object classes can be learned as the

models of topics. Object classes are treated as topics. Since an image may include

objects of several classes, it is modeled as a mixture of topics. An advantage of such

an approach is that manually segmenting objects at the pixel level is not required for

training. Some proposed approaches [ 11 , 50 , 51 ] were totally unsupervised. Some

required labeling at the image level [ 52 , 53 ]. Some semantic object segmentation

approaches based on topics models will be reviewed in this section.

2 http://labs.google.com/sets

Search WWH ::

Custom Search

Home