Information Technology Reference
In-Depth Information
1
Introduction
Object localization predicts the bounding box of a specific object class within
the image. Effective object localization relies on efficient and effective searching
method, and robust image representation and learning method. The task remains
challenging due to within-class variations and the large search space for candidate
bounding boxes.
A straightforward way to carry out localization is the sliding window approach
[9], which applies learned classifiers over all candidate bounding boxes. However,
an exhaustive search in an n
n 4
candidate bound-
ing boxes, and is not affordable with complex classifiers. Tricky heuristics about
possible bounding box locations, widths and heights, or local optimization methods
would have to be used, resulting in false estimates. Despite the great improvement
in computer capabilities, the intrinsic tradeoff between performance and efficiency
is not desirable, particularly for applications that are highly efficiency sensitive. In
recent years, the most popular technique in the sliding window approach is the cas-
cade [10], which decomposes a strong object/non-object classifier into a series of
simpler classifiers arranged in a cascade. However, the cascade is slow to train and
involves many empirical decisions. Moreover, it always reduces the performance
compared with the original strong classifier. As an alternative to the sliding window
approach, Lampert et al. introduced a branch-and-bound search scheme[5], which
finds the globally optimal bounding box efficiently without the above problems.
Robust image representation and learning is critical to the success of various
computer vision applications. Some of the successful features are Histogram of
Oriented Gradients [14] and Haar-like features [10]. Patch-based histogram-of-
keywords image representation methods represent an image as an ensemble of local
features discretized into a set of keywords. These methods have been successfully
applied in object localization [5] and image categorization[3]. The Gaussian mix-
ture model (GMM) is widely used for distribution modeling in speech recognition,
speaker identification and computer vision. Recently, the Gaussianized vector rep-
resentation was proposed as an innovative image and video vector representation
based on the GMM [12]. Variants of this Gaussianized vector representation have
been successfully applied in several applications related to interactive multimedia,
such as facial age estimation [11, 15], image scene categorization [12] and video
event recognition [13].
While the Gaussianized vector representation proves effective in the above visual
recognition tasks, all these are classification or regression problems working on the
whole images. In contrast, the object detection or localization problem finds the
rectangle bounding boxes for instances of a particular object with varying locations,
widths and heights. However, it is not clear how to use the Gaussianized vector rep-
resentation to capture localized information besides global information in an image.
No work has yet explored applying the Gaussianized vector representation in the
object localization problem.
In this work, we present an object localization approach combining the
efficient branch-and-bound searching method with the robust Gaussianized vector
×
n image needs to evaluate O
(
)
 
Search WWH ::




Custom Search