Efficient Object Localization with Variation-Normalized Gaussianized Vectors - Intelligent Video Event Analysis and Understanding

Information Technology Reference

In-Depth Information

representation. The branch-and-bound search scheme [5] is adopted to perform fast

hierarchical search for the optimal bounding boxes, leveraging a quality bound for

rectangle sets. We demonstrate that the quality function based on the Gaussianized

vector representation can be written as the sum of contributions from each feature

vector in the bounding box. Moreover, a quality bound can be obtained for any rect-

angle set in the image, with little computational cost, in addition to calculating the

Gaussianized vector representation for the whole image.

To achieve improved robustness to variation in the object class and the back-

ground, we propose incorporating a normalization approach that suppresses the

within-class covariance of the Gaussianized vector representation kernels in the bi-

nary Support Vector Machine (SVM) and the branch-and-bound searching scheme.

We carry out object localization experiments on a multi-scale car dataset. The re-

sults show the proposed object localization approach based on the Gaussianized vec-

tor representation outperforms a similar system using the branch-and-bound search

based on the histogram-of-keywords representation. The normalization approach

further improves the performance of the object localization system. These suggest

that the Gaussianized vector representation can be effective for the localization prob-

lem besides the classification and regression problems reported previously.

The rest of this chapter is arranged as follows. In Section 2, we describe the pro-

cedure of constructing Gaussianized vector representation. Section 3 presents the

normalization approach for robustness to object and background variation. Section

4 details the proposed efficient localization method based on the Gaussianized vec-

tor representation. The experimental results on multi-scale car detection are reported

in Section 5, followed by conclusions and discussion in Section 6. This chapter is

extended from our paper at the 1st International Workshop on Interactive Multime-

dia for Consumer Electronics at ACM Multimedia 2009 [16].

2

Gaussianized Vector Representation

The Gaussian mixture model (GMM) is widely used in various pattern recognition

problems [8, 7]. Recently, the Gaussianized vector representation was proposed.

This representation encodes an image as a bag of feature vectors, the distribution

of which is described by a GMM. Then a GMM supervector is constructed us-

ing the means of the GMM, normalized by the covariance matrices and Gaussian

component priors. A GMM-supervector-based kernel is designed to approximate

Kullback-Leibler divergence between the GMMs for any two images, and is utilized

for supervised discriminative learning using an SVM. Variants of this GMM-based

representation have been successfully applied in several visual recognition tasks,

such as facial age estimation [11, 15], scene categorization [12] and video event

recognition [13].

As pointed out by [12], the success of this representation can be attributed to

two properties. First, it establishes correspondence between feature vectors in dif-

ferent images in an unsupervised fashion. Second, it observes the standard normal

distribution, and is more informative than the conventional histogram of keywords.

Intelligent Video Event Analysis and Understanding

Search WWH ::

Custom Search

Home