Digital Signal Processing Reference
In-Depth Information
Fig. 3.2 Typical steps of semantic object segmentation. They are done over image pixels, patches
or oversegmented superpixels
objects. Their responses are typically quantized into textons or visual words accord-
ing to codebooks learned in a supervised or unsupervised way. The histograms of
textons or visual words are used as input to a classifier to predict labels of ob-
ject classes. In order to well capture the local consistency and long-range contextual
information, CRF or generative models are used to incorporate with local classifiers.
These steps can be on at image pixels, patches, or oversegmented superpixels. Many
different technologies have been developed to improve each of the three steps. We
will review these technologies and discuss the major challenges for these steps. In
recent years, some benchmark databases, such as PASCAL VOC 2007 [ 5 ], PASCAL
VOC 2008 [ 6 ], PASCAL VOC 2009 [ 1 ], LabelMe [ 7 ], LHI [ 8 ],andMSRC21[ 2 ],
were published to evaluate the performance of different semantic object segmenta-
tion approaches.
In video segmentation, Markov random fields (MRFs) and CRFs are two main
frameworks. Statistically, video segmentation formulizes and maximizes a posterior
probability of the labels given by the observation data. In the case that there is no
or only small number of labeled data, some heuristic or prior knowledge based
distributions can be selected to describe the observation data. Based on the selected
distributions and the prior of labels modeled in a MRF, the MRF approaches for-
mulate the posterior via likelihoods and priors in Baye's rule. On the contrast, CRFs
model the posterior directly to improve the predictive performance if there are large
quantities of training data. In CRFs, the model of the observation data is obtained
by learning from the training data using some classifiers. Compared to MRFs, CRFs
relax the assumption of data independence, while large more expensive labeled data
is necessary in CRFs.
This chapter is organized as follows. Section 3.2 introduces different types of
filter-banks and visual descriptors to capture local appearance, and different tech-
niques to quantize their responses into textons or visual words. Some popular
classifiers on local appearance are reviewed in Sect. 3.3.1 . Section 3.3.2 introduces
CRF and different approaches of using CRF for semantic object segmentation.
Section 3.4 first introduces two classical topic models, Probabilistic Latent Se-
mantic Analysis [ 9 ] (pLSA) and Latent Dirichlet Allocation [ 10 ](LDA),which
were directly borrowed from language processing and applied to semantic ob-
ject segmentation. Both pLSA and LDA ignored the spatial distribution of image
patches. Spatial Latent Dirichlet Allocation [ 11 ], which is an extension of LDA
and other topic models incorporating spatial structures of objects are introduced in
Sects. 3.4.2 and 3.4.3 . The approaches of object segmentations in videos are dis-
cussed in Sect. 3.5 . Finally the summary is given in Sect. 3.6 .
Search WWH ::




Custom Search