Digital Signal Processing Reference
In-Depth Information
need an accurate silhouette segmentation, but they are limited in scenarios with com-
plex background, illumination changes, noise, and obviously, moving camera.
On the other hand, the GMDs are commonly used in surveillance applications to
detect abnormal movements or to characterize human activities by computing relevant
features that highlight and summarize motion. For example, 3d spatio temporal Haar
features have been proposed to build volumetric descriptors in pedestrian applications
[6]. GMDs are also frequently based on the apparent motion field (optical flow), fully
justified because it is relatively independent of the visual appearance. For instance,
Ikizler et al [3] used histograms of orientations of (block-based) optical flow com-
bined with contour orientations. This method can distinguish simple periodic actions
but its temporal integration is too limited to address more complex activities. Guan-
gyu et al [5] use dense optical flow and histogram descriptors but their representation
based on human-centric spatial pattern variations limits their approach to specific
applications. Chaudhry et al [4] proposed histograms of oriented optical Flow
(HOOF) to describe human activities. Our descriptor for instantaneous velocity field
is very close from the HOOF descriptor, with significant differences that will be hig-
hlighted later, and the temporal part of their descriptor is based on time series of
HOOFs, which is very different from our approach.
The main contribution of this work is a motion descriptor which is both entirely
based on dense optical flow information and usable for recognition of actions or
events occurring in surveillance video sequences. The instantaneous movement in-
formation, represented by the optical flow field at every frame, is summarized by
orientation histograms, weighted by the norm of the velocity. The temporal sequence
of orientation histograms is characterized at every histogram bin as some temporal
statistics computed during the sequence. The resultant motion descriptor achieves a
compact human activity description, which is used as the input of a SVM binary clas-
sifier. Evaluation is performed with the Weizmann [8] dataset, from which 10 natural
actions are picked, and also with the ViSOR video-surveillance dataset [9], from
which 5 different activities are used. This paper is organized as follows: Section 2
introduces the proposed descriptor, section 3 demonstrates the effectiveness of the
method and the last section concludes with a discussion and possible future works.
2
The Proposed Approach
The method is summarized on Figure 1. It starts by computing a dense optical flow
using the local jet feature space approach [10]. The dense optical flow allows to seg-
ment the region with more coherent motion in a RoI. A motion orientation histogram
is then calculated, using typically 32 directions. Every direction count is weighted by
the norm of the flow vector, so an important motion direction can be due to many
vectors or to vectors with large norms. Finally, the motion descriptor groups up the
characteristics of each direction by simple statistics on the temporal series, whose
purpose is to capture the motion nature.
Search WWH ::




Custom Search