Digital Signal Processing Reference
In-Depth Information
Fig. 3.5 Examples of visual words obtained by the filter-banks proposed in [ 2 ] and k-means. The
first row are images and the second row are visual words. Colors represent different visual words
these difficulties, Nister et al. [ 28 ] proposed the vocabulary tree constructed by hi-
erarchical k-means. It allowed a larger and more discriminatory codebook to be
used efficiently. Moosmann et al. [ 29 ] proposed Extremely Randomized Clustering
Forests, which were ensembles of randomly created clustering trees, to learn the
codebook. It provided more accurate results and was faster than k-means. Elkan
[ 30 ] used the triangle inequality to dramatically accelerate k-means, while guaran-
teed always computing exactly the same result as the standard k-means.
K-means assumed hard assignment, i.e. exactly assigning a single visual word to
one image feature. If an image feature is relevant to multiple textons or visual words,
only the best is selected. If none of the codewords in the codebook well represent the
image feature, the best one is still assigned to the image feature. These may cause
problems during object segmentation. van Gemert et al. [ 31 ] created codebooks
using kernel density estimation. It modeled the uncertainty between visual words
and image features.
The above approaches are unsupervised. Some supervised approaches learned
codebooks incorporate semantic information. These codebooks were more compact
and discriminative. Winn et al. [ 2 ] learned an optimally compact visual codebook
by pairwise merging of visual words given segmented images for training. Shotton
et al. [ 32 ] proposed semantic texton forests, which were randomized decision forests
[ 33 ] and were learned from image pixels. Perronnin et al. [ 34 ] learned different
codebooks for different object classes by adapting a universal codebook, which de-
scribed the content of all the classes of images, using class-specific data. Both the
universal codebook and adapted class-codebooks were used for classification.
3.3
Object Segmentation Using Discriminative Approaches
3.3.1
Classifiers on Local Appearance
The obtained histograms of textons or visual words within local regions capture the
features of local appearance and are usually used as the input of classifiers to predict
Search WWH ::




Custom Search