Optical flow-based representation for video action detection - Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition

Image Processing Reference

In-Depth Information

entation in these cases. Temporal data mining and time-series classification can be exempliied

for the approaches on temporal information retrieval.

The types of the features and their quality on describing the domain knowledge also inlu-

ence the temporal information processing and its application. Also, having high dimensional-

ity makes the effective representation of temporal information with more complicated features

important. Therefore, feature definitions, construction, and feature extraction methods play an

important role in processing the temporal information. As the focus here is feature extraction

and construction, the improvements are measured with common methods.

In content-based video information retrieval, visual video data behave like temporal inform-

ation containing frame sequences over time. Each frame of the video has its visual informa-

tion along with its time value. The temporal information representation highly depends on

the visual content of video frames. The basic and the most primitive representation of time

poral video information can be done by using the video with all pixel intensities of all frames.

While this representation includes the richest visual information, processing and interpreting

information is impractical. In a 600 × 480 frame size for a 10 s scene (30 frames/s, fps), 86.4M

features exist with this approach. Therefore, there is a need for efficient representation form-

alisms.

Key-frame-based representation is one of the candidate approaches for representing time

poral information in videos. For each scene, a key-frame is selected based on some calculations

using visual features. The entire scene is represented and feature size of the representation is

decreased by using this key frame. But, there is an important problem in key-frame-based ap-

proaches; i.e., lack of the important information resulting from the motion in videos.

Another approach is BoW approach for frame sequences. In this kind of representation,

frames are behaved as code words obtained from grouping of the frames according to the

visual features. With these code words, frame sequences are represented as sentences. This

kind of representation contains temporal nature of the scenes. But, the most important disad-

vantage of this representation is the restricted nature of code words. Representing a visually

rich frame with a label means losing an important amount of information. The representation

is restricted with the variety of the code words. Therefore, limitless types of frames will be re-

duced to very limited number of labels.

Interest points-based representation is an alternative formalism for temporal video inform-

ation. Interest points are the “important” features that may best represent the video frames in-

variant from the scale and noise. This representation alternative is very successful in reducing

the huge frame information into small but descriptive paterns. But, it is again disadvantage-

ous in detecting motion features despite its descriptiveness. As the motion features include

flow with time, it is important to track the features along the time. Using interest points for

representation lacks the motion-based information.

State-space methods are also used for representing temporal video information. The state-

space methods define features which span the time. The space-time interest point concept is

proposed by Laptev and Lindeberg [ 16 ] . Interest points that are spatially defined and extrac-

ted in 2D are extended with time. With this extension, interest points gain a 3D structure with

time. Therefore, a space-time 3D sketch of frame paterns can be obtained and they are ready

for processing. State-space approaches best it the representation of video information tempor-

ally as they can associate the time with the visual information in a descriptive and integrated

way.

In our study, a state-space-based representation approach is proposed. Optical flow is the

motion feature—integrating time with visual features—utilized for constituting the state-

space method.

Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition

Search WWH ::

Custom Search

Home