Information Technology Reference
In-Depth Information
Fig. 4. Example video frames from the positive samples (frontal stand up events).
Practical problems (e.g. motion in background) and inner-class variability (e.g. hand
gestures, phone call) are clearly visible.
4.1
Drinking Dataset
To evaluate our method from the existing public datasets we used the drinking
events introduced in [14]. However, on the author's website only a limited number
of shots are publicly available (33 shots in a single avi file). Therefore, our tests
are also limited to these shots. Negative training and test samples were created
by cropping random sized video parts from random temporal positions. The
negative dataset contains 39 samples.
Stand Up Dataset 1
4.2
Most of existing datasets were recorded in controlled environment, hence we
started to develop a new realistic dataset recorded in indoor environment. During
the development we focused on practical problems such as moving objects in the
background, occlusion or hand movements of the actors. Videos were recorded
in our oce by an ordinary IP camera. The dataset currently contains actions
of six actors recorded at seven different scenes. For the recordings we used the
camera's own software, which used a standard MPEG-4 ASP coder at 1200 kbps
rate for compression. The videos were recorded at 640
480 fsize and 30 fps rate.
Positive samples. From the recordings we manually cropped the frontal stand
up events using a window with 0.75 aspect ratio. This window contained the
body from the knee to the head with several extra pixels at the borders. Finally
the windows were resized to 96
×
128 pixels using bicubic interpolation. In this
set the duration of the events falls between 0.58 sec (18 frames) and 1.37 sec (41
frames). Currently the dataset contains 72 video sequences of the event. Fig. 4
presents example frames from the dataset.
×
Negative samples. We used two sets as negative samples. For the first set we
manually selected some segments where different types of actions/movements
were present and used the same method for resizing as we used for the positive
samples. This dataset currently contains 67 video sequences. The second negative
set was created by cropping random sized spatio-temporal windows (assuming
1 Publicly available at http://web.eee.sztaki.hu/ ~ ucu/sztaki_standup.tar.gz
 
Search WWH ::




Custom Search