Optical flow-based representation for video action detection - Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition

Image Processing Reference

In-Depth Information

1 Introduction

Video action recognition is a field of multimedia research enabling us to recognize the actions

from a number of observations. The observations on video frames depend on the video fea-

tures derived from different sources. While textual features include high-level semantic in-

formation, they cannot be automated. The recognition strongly depends on the textual sources

which are commonly created manually. On the other hand, audio features are restricted to a

supervisor role. As the audio does not contain strong information showing the actions con-

ceptually, it can be used as an additional resource supporting visual and textual information.

Visual video features provide the basic information for the video events or actions. Although

it is difficult to obtain high levels of semantics by using visual information, a convincing way

to construct an independent fully automated video annotation or action recognition model is

to utilize visual information as the central resource. This way takes us to content-based video

information retrieval.

Content-based video information retrieval is the automatic annotation and retrieval of con-

ceptual video items such as objects, actions, and events using the visual content obtained from

video frames. There are various methods to extract visual features and use them for different

purposes. The visual feature sets they use vary from static image features (pixel values, col-

or histograms, edge histograms, etc.) to temporal visual features (interest point flows, shape

descriptors, motion descriptors, etc.). Temporal visual features combine the visual image fea-

tures with the time information. Representing video information using temporal visual fea-

tures generically means modeling the visual video information with temporal dimension, i.e.,

constructing temporal video information.

We need to represent the temporal video information formally for developing video action

recognition methods. Visual features such as corners and visual interest points of video frames

are the basics for constructing our model. These features are used for constructing a more

complicated motion feature, namely, optical flow. In our work, we propose a new tempor-

al video segment representation method to retrieve video actions for formalizing the video

scenes as temporal information. The representation is fundamentally based on the optical flow

vectors calculated for the frequently selected frames of the video scene. Weighted frame velo-

city concept is put forward for a whole video scene together with the set of optical flow vec-

tors. The combined representation is used in the action-based temporal video segment classi-

ication. The result of the classification represents the recognized actions.

The main contribution of this work is the proposed temporal video segment representation

method. It is aimed to be a generic model for temporal video segment classification for action

recognition. The representation is based on optical flow concept. It uses the common way of

partitioning optical flow vectors according to their angular features. An angular grouping of

optical flow vectors is used for each selected frame of the video. We propose the novel concept

of weighted frame velocity as the velocity of the cumulative angular grouping of a temporal

video segment in order to represent the motion of the frames of the segments more descript-

ively.

The outline of this article is as follows. In Section 2 , related work is proposed. Section 3 dis-

cusses the temporal segment representation. Optical flow is described in Section 4 and optical

low-based segment representation is discussed in Section 5 . The inspiration of representation

in cut detection is given in Section 6 . In Section 7 , experiments and results are presented. Fin-

ally, in Section 8 , the conclusion is proposed.

Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition

Search WWH ::

Custom Search

Home