Optical flow-based representation for video action detection - Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition

Image Processing Reference

In-Depth Information

important field of study. From this point of view, event boundary detection, temporal video

segmentation, cut detection, etc. are similar concepts dealing with this problem.

Again, textual and audio features together with visual features are important sources of in-

formation for temporal video segmentation likewise in temporal video segment representa-

tion. Our main concern about automaticity and dependency of textual features to manual cre-

ation also continues here. Therefore, visual features' domination proceeds in this problem, too.

Regarding this domination, detecting the cuts between video scenes using visual features is

an important problem. Optical flow is the key concept behaving as an operator inspiring from

the representation proposed for action recognition in this study. The fundamental idea, here,

is that some sort of change in optical flow character determines the cuts. In detail, the hypo-

thesis is that the difference of intensity values between the pixels (mapped with optical flow

vectors) of consecutive frames changes at the cut points. Calculated optical flow vectors in the

irst phase, video segment representation, can also be used here as building block features op-

erating on pixel difference calculations to represent scene changes. This yields to a decrease in

the computational complexity because of the fact that the feature base, optical flow vectors, is

same and singularly calculated for both phases.

Estimated optical flow vectors for each frame in the previous part can be used. The equation

R = [ S ( V ), Φ ] giving the optical flow-based generic representation, defined in Section 5 , can be

handled and adapted to cut detection. From this point of view, Φ is the operator defining the

relations between the optical flow vectors S ( V ) and giving their meaning for representing cuts.

In our adaptation of the above representation to temporal video segmentation—cut detec-

tion, the description of Φ is important. In order to make this description, the parameters used

in the definition should be described.

7 Experiments and results

Temporal segment classification for action recognition uses the vector representation pro-

posed in Section 5 . Support Vector Machines (SVM) is used for nonlinear classification. Gaus-

sian radial basis function—using standard deviation σ for two feature vectors x i , x j - —is selec-

ted as SVM kernel.

(21)

Hollywood Human Actions dataset [ 41 ] is used for evaluation. Hollywood dataset includes

video segments composed of human actions from 32 movies. Each segment is labeled with

one or more of 8 action classes: AnswerPhone, GetOutCar, HandShake, HugPerson, Kiss, SitDown,

SitUp, and StandUp. While, the test set is obtained from 20 movies, training set is obtained

from 12 other movies different from those in the test set. The training set contains 219 video

segments and the test set contains 211 samples with manually created labels.

After the optical flows are estimated, the calculations for constructing feature vectors are

carried out accordingly and feature vectors are obtained for the test data. The number of an-

gular intervals is taken as 30 as in Ref. [ 14 ] . The threshold C in the threshold function below,

as discussed in this section, was determined experimentally.

Search WWH ::

Custom Search

Home