Efficient Object Localization with Variation-Normalized Gaussianized Vectors - Intelligent Video Event Analysis and Understanding

Information Technology Reference

In-Depth Information

Obviously, given a rectangle set R , the first term in Equation 19 is maximized by

taking all the positive contributions from the largest rectangle in the set. The second

term in Equation 19 is negative and its absolute value can be minimized by taking

all the negative contributions in the smallest rectangle.

Second, when the rectangle set R contains only one rectangle, R min

=

R max

R .

Equation 18 equals Equation 19,

f

(

R

)=

f

(

R

) .

This quality bound defined by Equation 18 is used in the branch-and-bound scheme

discussed in Section 4.1 to achieve fast and effective detection and localization.

Note that since the bound is based on sum of per feature vector contributions, the

approach can be repeated to find multiple bounding boxes in an image, after remov-

ing those features claimed by the previously found boxes. This avoids the problem

of finding multiple non-optimal boxes near a previously found box as in the sliding

window approach.

Note that estimating W j in Equation 16 involves no more computation than the

calculation in a binary classifier using the Gaussianized vector representation of the

whole image. To further expedite the localization, we can use two integral images

[10] to speed up the two summations in Equation 18 respectively. This makes the

calculation of f

(

R

)

independent from the number of rectangles in the set R .

4.4

Incorporating Variation-Normalization

To further improve the discriminating power of the Gaussianized vector represen-

tation in the localization problem, we incorporate the normalization approach in

Section 3. In particular, this involves the following modifications of the proposed

efficient localization system.

First, the SVM is trained using kernels with normalization against within-class

variation. In particular, Equation 11 is used instead of Equation 8.

Second, Equation 13 is replaced by Equation 20 to suppress the subspace that

corresponds to the most within-class variation when evaluating the quality of the

candidate regions.

)= ∑ t α t φ ( Z )

T

VCV T

f

(

Z

)=

g

(

Z

(

I

−

) φ (

Z t ) −

b

.

(20)

Third, the per feature vector contribution function in Equation 16 needs to be

revised accordingly.

Intelligent Video Event Analysis and Understanding

Search WWH ::

Custom Search

Home