Information Technology Reference
In-Depth Information
3.3
Descriptors
We use the original terms of [13] with the temporal extension. A cel l is a small
spatio-temporal rectangular cuboid of 8
8 pixels and 1/6 sec duration, and
in each cel l a 4, 6 or 8-bin histogram is calculated from the quantized optical
flow directions and frame difference gradient directions, while the magnitudes
are used for weighted voting. A block is created as a group of several adjacent
cel l s(2
×
2 cel l s in our experiments), and is used for normalizing the his-
tograms of the cel l s. The features are the normalized histograms: HFD and
HDG . A detection window of 96
2
×
×
128 pixels and 1 sec duration is tiled by these
overlapping blocks . The features (normalized histograms of the block s) in the
spatio-temporal window are concatenated to form a vector and are used to rec-
ognize the event. Fig. 2 demonstrates the extracted optical flow cel l histograms
before block normalization, (a) and (b) are two samples taken from the positive
dataset containing stand up events, while (c) is one sample from the negative
set. Each direction in the 6-bin histogram is represented by 10 pixels in the cel l ,
color is determined in the HSL color space according to
×
H i =( i
1 / 2)
30 ￿
90 ￿
(3)
×
S i =1
(4)
L i =0 . 5
h i ,
(5)
×
where h i is the histogram value of the i th bin, and H i is determined as the
mean direction of the bin (see Fig. 1(b) and (c)). To express the direction of the
motion the center pixels are represented by the bin with the highest h i value.
Since each descriptor is single-scale, the detection window canbeusedtoscan
the multi-scale representation of the input video segments, and the extracted fea-
ture vectors are used in SVM classifier to recognize the action. In our experiments
to obtain the desired 1 sec long window we applied temporal nearest neighbor
(a) (b) (c)
Fig. 2. Extracted histograms in the cel l s without block normalization, taken from the
same temporal positions; (a) and (b) are from the positive sample set (stand up event);
(c) is one sample from the negative set. The hue value in the HSL color space is used
for visualizing the directions of the bins (see Fig. 1(c)).
Search WWH ::




Custom Search