Information Technology Reference
In-Depth Information
3.3
Descriptors
We use the original terms of [13] with the temporal extension. A
cel l
is a small
spatio-temporal rectangular cuboid of 8
8 pixels and 1/6 sec duration, and
in each
cel l
a 4, 6 or 8-bin histogram is calculated from the quantized optical
flow directions and frame difference gradient directions, while the magnitudes
are used for weighted voting. A
block
is created as a group of several adjacent
cel l
s(2
×
2
cel l
s in our experiments), and is used for normalizing the his-
tograms of the
cel l
s. The features are the normalized histograms:
HFD
and
HDG
. A detection
window
of 96
2
×
×
128 pixels and 1 sec duration is tiled by these
overlapping
blocks
. The features (normalized histograms of the
block
s) in the
spatio-temporal
window
are concatenated to form a vector and are used to rec-
ognize the event. Fig. 2 demonstrates the extracted optical flow
cel l
histograms
before
block
normalization, (a) and (b) are two samples taken from the positive
dataset containing stand up events, while (c) is one sample from the negative
set. Each direction in the 6-bin histogram is represented by 10 pixels in the
cel l
,
color is determined in the HSL color space according to
×
H
i
=(
i
1
/
2)
30
90
(3)
−
×
−
S
i
=1
(4)
L
i
=0
.
5
h
i
,
(5)
×
where
h
i
is the histogram value of the
i
th bin, and
H
i
is determined as the
mean direction of the bin (see Fig. 1(b) and (c)). To express the direction of the
motion the center pixels are represented by the bin with the highest
h
i
value.
Since each descriptor is single-scale, the detection
window
canbeusedtoscan
the multi-scale representation of the input video segments, and the extracted fea-
ture vectors are used in SVM classifier to recognize the action. In our experiments
to obtain the desired 1 sec long
window
we applied temporal nearest neighbor
(a) (b) (c)
Fig. 2.
Extracted histograms in the
cel l
s without
block
normalization, taken from the
same temporal positions; (a) and (b) are from the positive sample set (stand up event);
(c) is one sample from the negative set. The hue value in the HSL color space is used
for visualizing the directions of the bins (see Fig. 1(c)).
Search WWH ::
Custom Search