Efficient Object Localization with Variation-Normalized Gaussianized Vectors - Intelligent Video Event Analysis and Understanding

Information Technology Reference

In-Depth Information

and height of an object with four parameters, i.e., the top, bottom, left and right

coordinates

The sliding window approach is most widely used in object localization with

bounding boxes [9, 2]. To find the bounding box where the quality function f

reaches its global maximum, we need evaluate the function on all possible rect-

angles in the image, whose number is on the order of O

(

)

n 4

(

)

n image.

To reduce the computational cost, usually only rectangles at a coarse location grid

and of a small number of possible widths and heights are considered. On the other

hand, different approaches can be adopted to use a local optimum to approximate

the global one, when the quality function f has certain properties, such as smooth-

ness. All these approaches make detection tractable at the risk of missing the global

optimum, and with demand for well informed heuristics about the possible location

and sizes of the object.

In recent years, the most popular technique in the sliding window approach is the

cascade [10]. The cascade technique decomposes a strong object/non-object classi-

fier into a series of simpler classifiers. These classifiers are arranged in a cascade, so

that the simpler and weaker classifiers will eliminate most of the candidate bound-

ing boxes, before the more powerful and complicated classifiers will make finer

selection. However, the cascade of classifiers is slow to train. Moreover, it unfor-

tunately involves many empirical decisions, e.g., choosing the false alarm rate and

missing rate at each stage of the cascade. The cascade technique always reduces the

performance compared with the original strong classifier.

The branch-and-bound search scheme was recently introduced [5] to find the

globally optimal bounding box without the heuristics and assumptions about the

property of the quality function. It hierarchically splits the parameter space of all

the rectangles in an image, and discards large parts if their upper bounds fall lower

than an examined rectangle.

For localization based on bounding boxes, a set of rectangles is encoded with

for an n

[

]

, each indicating a continual interval for the corresponding parameter in

(

. The approach starts with a rectangle set containing all the rectangles in

the image, and terminates when one rectangle is found that has a quality function

no worse than the bounds f of any other rectangle set.

At every iteration, the parameter space

)

is split along the largest of the

four dimension, resulting in two rectangle sets both pushed into a queue together

with their upper bounds. The rectangle set with the highest upper bound is retrieved

from the queue for the next iteration.

The steps of the branch-and-bound search scheme can be summarized as follows:

1. Initialize an empty queue Q of rectangle sets. Initialize a rectangle set R to be

all the rectangles: T and B are both set to be the complete span from zero to the

height of the image. L and R are both set to be the complete span from zero to

the width of the image.

2. Obtain two rectangle sets by splitting the parameter space

[

]

[

]

along the

largest of the four dimension.

3. Push the two rectangle sets in Step 2 into queue Q with their respective quality

bound.

Intelligent Video Event Analysis and Understanding

Search WWH ::

Custom Search

Home