Information Technology Reference
In-Depth Information
2
Methodology
In contrast to video sequences captured under well-controlled environment at frame
rate, our task for people detection requires to work in highly cluttered public scene
(underground) given low resolution data often at low frame rate. The scene also suf-
fers from (1) significant lighting changes, which makes the motion estimation un-
stable and noisy; (2) heavy occlusions, which requires the people detector to handle
partial match; (3) extensive background clutters, which can cause high false alarms.
To this end, we propose a robust people detection method for video sequences by
fusing static appearance feature based detector with a long-term motion based spa-
tial pyramid likelihood measure. An overview of our method is shown in Fig. 1.
Sliding
Window
Linear
SVM
HOG
Descriptor
Im age
Apperance
Bayesian
Ver ificat ion
Person
sequences
non-person
Motion
Motion
Modeling
Background
Modeling
Differencing
Pyramid
Fig. 1 Flow chart of our method for pedestrian detection. An appearance based detector is
used to create the initial hypothesis and long-term motion is modeled by the motion pyramid
approach. The above cues are combined in a Bayesian framework. The final candidates are
selected by thresholding.
2.1
Generating Hypothesis
We adopt a static people detector proposed by Dalal and Triggs [3] to generate static
human presence hypothesis in each frame. To achieve scale invariance, this detec-
tor utilizes a multi-scale sliding window approach, i.e. scanning each frame at each
scale level. Each sub-window image patch centered at location i (denoted by v i ,
where i
1: n and n is the number of patches) is transformed into a feature vector
before being classified into either human foreground or scene background by a clas-
sifier. The feature vector used here is a SIFT [16] like feature based on histogram
of gradient orientation. The basic idea is that local object appearance and shape can
often be characterized rather well by the distribution of local intensity gradients or
edge directions, even without any precise knowledge of corresponding gradient or
edge positions (similar work can be found in [21] using histograms of scale nor-
malized, oriented derivatives to detect and recognize arbitrary object classes). The
size of the detection window is 32
=
×
64 including 8 pixels of margin beyond the
window size. A linear SVM is used as the classifier and the output of the classifier
Search WWH ::




Custom Search