Efficient Object Localization with Variation-Normalized Gaussianized Vectors - Intelligent Video Event Analysis and Understanding

Information Technology Reference

In-Depth Information

The Gaussianized vector representation is closely connected to the classic his-

togram of keywords representation. In the traditional histogram representation, the

keywords are chosen by the k-means algorithm on all the features. Each feature

is distributed to a particular bin based on its distance to the cluster centroids. The

histogram representation obtains rough alignment between features vectors by as-

signing each to one of the histogram bins. Such a representation provides a natural

similarity measure between two images based on the difference between the corre-

sponding histograms. However, the histogram representation has some intrinsic limi-

tations. In particular, it is sensitive to feature outliers, the choice of bins, and the noise

level in the data. Besides, encoding high-dimensional feature vectors by a relatively

small codebook results in large quantization errors and loss of discriminability.

Gaussianized vector representation enhances the histogram representation in the

following ways. First, k-means clustering leverages the Euclidean distance, while

the GMM leverages the Mahamalobis distance by means of the component posteri-

ors. Second, k-means clustering assigns one single keyword to each feature vector,

while the Guassinized vector representation allows each feature vector to contribute

to multiple Gaussian components statistically. Third, histogram-of-keywords only

uses the number of feature vectors assigned to the histogram bins, while the Gaus-

sianized vector representation also engages the weighted mean of the features in

each component, leading to a more informative representation.

2.1

GMM for Feature Vector Distribution

We estimate a GMM for the distribution of all patch-based feature vectors in an im-

age. The estimated GMM is a compact description of the single image, less prone to

noise compared with the feature vectors. Yet, with increasing number of Gaussian

components, the GMM can be arbitrarily accurate in describing the underlying fea-

ture vector distribution. The Gaussian components impose an implicit multi-mode

structure of the feature vector distribution in the image. When the GMMs for dif-

ferent images are adapted from the same global GMM, the corresponding Gaussian

components imply certain correspondence.

In particular, we obtain one GMM for each image in the following way.

First, a global GMM is estimated using patch-based feature vectors extracted

from all training images, regardless of their labels. Here we denote z as a feature

vector, whose distribution is modeled by a GMM, a weighted linear combination of

K unimodal Gaussian components,

K

k = 1 w k N ( z ; μ

global

k

p

(

z ;

Θ )=

, Σ k ) .

global

1

Θ = {

w 1 , μ

, Σ 1 , ···}

, w k ,

μ k and

Σ k are the weight, mean, and covariance ma-

trix of the k th Gaussian component,

Intelligent Video Event Analysis and Understanding

Search WWH ::

Custom Search

Home