Recognizing Human Actions by Using Spatio-temporal Motion Descriptors - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

Fig. 4. Example video frames from the positive samples (frontal stand up events).

Practical problems (e.g. motion in background) and inner-class variability (e.g. hand

gestures, phone call) are clearly visible.

4.1

Drinking Dataset

To evaluate our method from the existing public datasets we used the drinking

events introduced in [14]. However, on the author's website only a limited number

of shots are publicly available (33 shots in a single avi file). Therefore, our tests

are also limited to these shots. Negative training and test samples were created

by cropping random sized video parts from random temporal positions. The

negative dataset contains 39 samples.

Stand Up Dataset 1

4.2

Most of existing datasets were recorded in controlled environment, hence we

started to develop a new realistic dataset recorded in indoor environment. During

the development we focused on practical problems such as moving objects in the

background, occlusion or hand movements of the actors. Videos were recorded

in our oce by an ordinary IP camera. The dataset currently contains actions

of six actors recorded at seven different scenes. For the recordings we used the

camera's own software, which used a standard MPEG-4 ASP coder at 1200 kbps

rate for compression. The videos were recorded at 640

480 fsize and 30 fps rate.

Positive samples. From the recordings we manually cropped the frontal stand

up events using a window with 0.75 aspect ratio. This window contained the

body from the knee to the head with several extra pixels at the borders. Finally

the windows were resized to 96

×

128 pixels using bicubic interpolation. In this

set the duration of the events falls between 0.58 sec (18 frames) and 1.37 sec (41

frames). Currently the dataset contains 72 video sequences of the event. Fig. 4

presents example frames from the dataset.

×

Negative samples. We used two sets as negative samples. For the first set we

manually selected some segments where different types of actions/movements

were present and used the same method for resizing as we used for the positive

samples. This dataset currently contains 67 video sequences. The second negative

set was created by cropping random sized spatio-temporal windows (assuming

1 Publicly available at http://web.eee.sztaki.hu/ ~ ucu/sztaki_standup.tar.gz

Search WWH ::

Custom Search

Home