Digital Signal Processing Reference
In-Depth Information
boosting to select from a dictionary connectivity templates, which were derived
from labeled segmentations. It exploited the contextual correlations between ob-
ject classes. Rabinovich et al. [ 46 ] explicitly defined the interactions between object
classes as semantic context and incorporated it into CRF. The semantic context was
modeled as the co-occurrence of object labels and was learned both from the train-
ing data and Google Sets. 2
Quattoni et al. [ 47 ] used CRF for part-based object recognition and detection.
CRF was used to model the spatial arranges of object parts. Ma and Grimson [ 48 ]
proposed a coupled CRF to decompose the images into contour and texture and to
model their interaction. The decomposed low-level cues were adaptively combined
for object recognition and different discriminative cues for different object classes
were fully leveraged. Reynolds and Murphy [ 49 ] proposed a tree-structured CRF
for object segmentation.
3.4
Object Segmentation Using Topic Models
The discriminative approaches described above required training data to be labeled
at pixel-level. If there are a large number of object classes to be modeled, the la-
beling work is very expensive. Some researchers started to explore approaches of
learning the models of object classes from a collection of images or videos with-
out supervision or with weak supervision (such as using training data labeled at
image-level). Inspired by the success of topic models, such as Probabilistic Latent
Semantic Analysis (pLSA) [ 9 ] and Latent Dirichlet Allocation (LDA) [ 10 ], in the
applications of language processing, they have been also applied to semantic object
segmentation in recent years. Under pLSA or LDA, words, such as “professor” and
“university”, often co-existing in the same documents are clustered into the same
topic, such as “education”. The models of topics are automatically without supervi-
sion. The word-document analysis has been applied to object segmentation through
mapping the concepts of “words” and “documents” to the image and video domains.
For example, if images are treated as documents and visual words (or textons) are
treated as words, with the assumption that visual words of the same object classes
often co-exist in the same images, the models of object classes can be learned as the
models of topics. Object classes are treated as topics. Since an image may include
objects of several classes, it is modeled as a mixture of topics. An advantage of such
an approach is that manually segmenting objects at the pixel level is not required for
training. Some proposed approaches [ 11 , 50 , 51 ] were totally unsupervised. Some
required labeling at the image level [ 52 , 53 ]. Some semantic object segmentation
approaches based on topics models will be reviewed in this section.
2 http://labs.google.com/sets
Search WWH ::




Custom Search