Efficient Object Localization with Variation-Normalized Gaussianized Vectors - Intelligent Video Event Analysis and Understanding

Information Technology Reference

In-Depth Information

1

Introduction

Object localization predicts the bounding box of a specific object class within

the image. Effective object localization relies on efficient and effective searching

method, and robust image representation and learning method. The task remains

challenging due to within-class variations and the large search space for candidate

bounding boxes.

A straightforward way to carry out localization is the sliding window approach

[9], which applies learned classifiers over all candidate bounding boxes. However,

an exhaustive search in an n

n 4

candidate bound-

ing boxes, and is not affordable with complex classifiers. Tricky heuristics about

possible bounding box locations, widths and heights, or local optimization methods

would have to be used, resulting in false estimates. Despite the great improvement

in computer capabilities, the intrinsic tradeoff between performance and efficiency

is not desirable, particularly for applications that are highly efficiency sensitive. In

recent years, the most popular technique in the sliding window approach is the cas-

cade [10], which decomposes a strong object/non-object classifier into a series of

simpler classifiers arranged in a cascade. However, the cascade is slow to train and

involves many empirical decisions. Moreover, it always reduces the performance

compared with the original strong classifier. As an alternative to the sliding window

approach, Lampert et al. introduced a branch-and-bound search scheme[5], which

finds the globally optimal bounding box efficiently without the above problems.

Robust image representation and learning is critical to the success of various

computer vision applications. Some of the successful features are Histogram of

Oriented Gradients [14] and Haar-like features [10]. Patch-based histogram-of-

keywords image representation methods represent an image as an ensemble of local

features discretized into a set of keywords. These methods have been successfully

applied in object localization [5] and image categorization[3]. The Gaussian mix-

ture model (GMM) is widely used for distribution modeling in speech recognition,

speaker identification and computer vision. Recently, the Gaussianized vector rep-

resentation was proposed as an innovative image and video vector representation

based on the GMM [12]. Variants of this Gaussianized vector representation have

been successfully applied in several applications related to interactive multimedia,

such as facial age estimation [11, 15], image scene categorization [12] and video

event recognition [13].

While the Gaussianized vector representation proves effective in the above visual

recognition tasks, all these are classification or regression problems working on the

whole images. In contrast, the object detection or localization problem finds the

rectangle bounding boxes for instances of a particular object with varying locations,

widths and heights. However, it is not clear how to use the Gaussianized vector rep-

resentation to capture localized information besides global information in an image.

No work has yet explored applying the Gaussianized vector representation in the

object localization problem.

In this work, we present an object localization approach combining the

efficient branch-and-bound searching method with the robust Gaussianized vector

×

n image needs to evaluate O

(

)

Intelligent Video Event Analysis and Understanding

Search WWH ::

Custom Search

Home