Semantic Object Segmentation - Video Segmentation and Its Applications

Digital Signal Processing Reference

In-Depth Information

Fig. 3.5 Examples of visual words obtained by the filter-banks proposed in [ 2 ] and k-means. The

first row are images and the second row are visual words. Colors represent different visual words

these difficulties, Nister et al. [ 28 ] proposed the vocabulary tree constructed by hi-

erarchical k-means. It allowed a larger and more discriminatory codebook to be

used efficiently. Moosmann et al. [ 29 ] proposed Extremely Randomized Clustering

Forests, which were ensembles of randomly created clustering trees, to learn the

codebook. It provided more accurate results and was faster than k-means. Elkan

[ 30 ] used the triangle inequality to dramatically accelerate k-means, while guaran-

teed always computing exactly the same result as the standard k-means.

K-means assumed hard assignment, i.e. exactly assigning a single visual word to

one image feature. If an image feature is relevant to multiple textons or visual words,

only the best is selected. If none of the codewords in the codebook well represent the

image feature, the best one is still assigned to the image feature. These may cause

problems during object segmentation. van Gemert et al. [ 31 ] created codebooks

using kernel density estimation. It modeled the uncertainty between visual words

and image features.

The above approaches are unsupervised. Some supervised approaches learned

codebooks incorporate semantic information. These codebooks were more compact

and discriminative. Winn et al. [ 2 ] learned an optimally compact visual codebook

by pairwise merging of visual words given segmented images for training. Shotton

et al. [ 32 ] proposed semantic texton forests, which were randomized decision forests

[ 33 ] and were learned from image pixels. Perronnin et al. [ 34 ] learned different

codebooks for different object classes by adapting a universal codebook, which de-

scribed the content of all the classes of images, using class-specific data. Both the

universal codebook and adapted class-codebooks were used for classification.

3.3

Object Segmentation Using Discriminative Approaches

3.3.1

Classifiers on Local Appearance

The obtained histograms of textons or visual words within local regions capture the

features of local appearance and are usually used as the input of classifiers to predict

Search WWH ::

Custom Search

Home