Information Technology Reference
In-Depth Information
work in human detection (see [7] and [2] for a survey). These work can be broadly
categorized into two groups: static and dynamic people detectors. Static people de-
tectors rely mainly on finding robust appearance features that allow human form to
be discriminated against a cluttered background using a classifier such as SVM or
AdaBoost searching through a set of sub-images by a sliding window. Typical fea-
tures include rectified Haar wavelets [17], rectangular features [23], and SIFT (Scale
Invariant Feature Transform) like features such as histogram of oriented gradients
[16, 3]. Papageorgiou et al. [17] described a pedestrian detector based on SVM us-
ing Harr wavelet features. Gavrila and Philomin [6] presented a real-time pedestrian
detection system by utilizing silhouettes information extracted from edge images.
The candidate of the silhouettes is selected as the one with the smallest chamfer
distance to a set of learned human shape examples. On the other hand, there is little
progress on dynamic detectors, although the idea of using pure motion information
for human pattern recognition is not new [11, 9, 20]. Most existing work utilises
optic flow. Viola et al. [23] proposed a very efficient detector using AdaBoost that
can achieve real-time performance. The rather simple rectangular features and the
cascade structure account for the efficiency of this approach. Motion information
was also taken into account through a coarse estimation of optic flow between two
consecutive frames. Similar work of using optic flow for people detection can be
found in [4]. To achieve satisfactory performance, this approach assumes that the
human motion information in the test sequences is similar to those in the training
set. Other related work using motion information includes human behavior recog-
nition by distribution of 3D spatial-temporal interest points [22, 13], 3D volumetric
features [12], or through 3D correlation [1]. Overall, existing methods for comput-
ing motion assume mostly that the motion is locally smooth. However this is untrue
especially in busy public scenes when measuring optic flow is sensitive to noise and
unreliable due to lighting change, reflection, moving background such as tree leaves
(see Fig. 2).
To date, work on utilising both motion and appearance information remains in its
infancy. To our best knowledge, there is little work performing direct people detec-
tion using both appearance and long-term motion information, whilst our previous
work [25] has show some promising detection results using long-term motion score.
In this work, we present a robust framework for people detection in highly cluttered
public scenes by utilizing both human appearance and their long-term motion infor-
mation whilst reliable optic flow cannot be estimated. We further introduce a spatial
pyramid Gaussian Mixture approach to effectively model the variations of long-term
motion information which takes into the account of local geometric constrains, and
shows slightly better results than just using pure motion score [25]. Our method does
not require the estimation of continuous motion such as optic flow in training thus
reduces the number of features required for training a classifier. It allows for any
detected appearance hypothesis to be verified using long-term motion history anal-
ysis. We show experimental results to demonstrate the efficiency and robustness of
the proposed approach against that of a state of the art static people detector.
 
Search WWH ::




Custom Search