Fusion of Motion and Appearance for Robust People Detection in Cluttered Scenes - Intelligent Video Event Analysis and Understanding

Information Technology Reference

In-Depth Information

in Eq. (6) is the prior associated with each pyramid level and used to

combine the the conditional likelihood for each level, denoted as

Item P

(

)

β l . Motivated by

the theory of spatial pyramid matching, we determine the weight

β l from the formu-

lation of a maximum-weight problem [8, 14], which is inversely proportional to cell

width at that level as shown in Eq. (8). Intuitively, we want to penalize likelihood

found in larger cells because they involve increasingly dissimilar features. Taken all

of these into consideration, we calculate a pyramid likelihood as follows:

l = 0 β l p ( Y | l )=

l = 1

2 L p

2 L − l + 1

(

p l (

)

(8)

It is worthwhile noting that our motion descriptor has a high dimensionality, e.g. the

dimensions of the descriptor at level 2 is 320. Thus directly using Gaussian Mix-

ture Model would involve the estimation of thousands of parameters. Typically, this

would be time consuming and quite unstable in a higher dimensional space. To avoid

this, we first use Principle Component Analysis (PCA) to reduce the dimensionality

of the motion descriptor for each pyramid level respectively, and then use GMM

to model its distributions. The effectiveness of PCA-GMM has been demonstrated

in [15]. In our work here, GMM is learned by maximizing the likelihood function

using Expectation-Maximization. The number of the Gaussian components is de-

termined automatically using Minimum Description Length criterion [19]. Thus the

modeling process of the pyramid Gaussian Mixture is illustrated in Fig. 3. Once

the parameters ( ˆ

) are estimated, given a motion template m with pyramid motion

descriptor Y t under a people hypothesis h , the likelihood with respect to the learned

human model is calculated as follows:

(

Θ )

= ∑ l β l p

(

Y t

(

Y t |

θ i )

(9)

= ∑

β l

∑

α i N

(

Y t (

)

; u i ,

Σ i )

∈

(

)

(

)

where

are the estimated parameters of Gaussian mixture. Since Y t is com-

puted based on the a given motion template ( m ) of a hypothesis ( h ) by a static object

class detector ( o ), it is natural to represent p

u i

. In the following

section, we will deploy these annotations to give a clear illustration of the verifica-

tion process using the Bayesian graph model.

Verification. For a bounding box hypothesis, we wish to find the probability of

the presence of an object given its motion template m and appearance measure c ,

(

Y t |

Θ )

as p

(

)

(

)

, which is given by the Bayesian rule as follows:

Z p

(

)

(

)

(

)

(10)

where Z is the normalization factor. In this model, we assume that the motion m and

the appearance c are conditionally independent. To understand this more clearly, the

directed probability graph model of the Bayesian verification process ( Eq. (10))

is shown in Fig. 4, where the arrows indicate the dependencies between variables.

(

)

is the contribution of the motion within the hypothesis bounding box given

Intelligent Video Event Analysis and Understanding

Search WWH ::

Custom Search

Home