Fusion of Motion and Appearance for Robust People Detection in Cluttered Scenes - Intelligent Video Event Analysis and Understanding

Information Technology Reference

In-Depth Information

noisy and unstable. To address this problem, in this work we adopt an alternative

long-term motion estimation approach using background extraction and subtrac-

tion, given that most surveillance CCVT systems are based on fixed views. More

precisely, we utilise a Gaussian mixture background model of [24]:

)= ∑ i α i g ( f ( x , y ) , θ i , x , y , σ i , x , y ) ,

(

(1)

where x

y is the location of each pixel,

( θ i , x , y , σ i , x , y )

are the model parameters of

each individual Gaussian component g ,and f t (

is the local pixel intensity. Once

the parameters are estimated, the likelihood of one frame f

)

at time t with re-

spect to the background model is computed as the probability distance given by

(

)

)= ∑ i α i ex p

2 (

f t (

) − θ i , x , y )

(

−

(2)

y 2

This type of motion information is very effective at highlighting changes in motion

of every pixel in the scene. However, this is also an undesirable property since the

noisy motion caused by lighting changes is inevitably augmented. See Fig. 8 (b) as

an example. To suppress the noisy motion caused by lighting changes, we further

take spatial motion contrast into consideration in the Gaussian mixture model as

follows:

)= ∑ i α i ex p

2 (

f t

(

) − θ

)

(

−

(3)

In the background model of Eq.(2),

σ i , x , y is the estimated strength of the motion of

each pixel at

σ i , x , y . Examples of motion

extraction using this model are shown in Fig. 8 (b) and (c), where in (b) motion was

estimated using the Gaussian mixture background model without considering spatial

motion contrast whilst in (c), it was taken into account. This demonstrates clearly

the effectiveness of utilising the spatial motion contrast measure given by Eq.(3) for

removing motion noise as compared to existing Gaussian mixture models.

(

)

, we calculate

σ s in Eq.(3) as the mean of

2.3

Spatial Motion Descriptor

Base on the background modeling described in Sec. 2.2, we can estimate a mo-

tion confidence measure of each hypothesis created by the static detector. Next is to

construct a robust hypothesis descriptor based on the motion information. Inspired

by the success of SIFT descriptor [16], we propose a multi-level spatial pyramid

descriptor by directly utilising the motion confidence calculated from Eq.(3) to ef-

fectively describe the motion region of the hypothesis. The descriptor extraction

procedure consists of the following steps:

1. Creating a codebook of confidence measure. Because the confidence v

(

)

in principle a probability with v

∈ [

]

, we can create C bins of the value with

Intelligent Video Event Analysis and Understanding

Search WWH ::

Custom Search

Home