Information Technology Reference
In-Depth Information
Table 11.1 The characteristics of training and test movies of the Hollywood movie dataset (The
number of movies and video segments, the number and percentage of violent and nonviolent video
segments)
Dataset
Movies
Video segments
Violent
Nonviolent
Train
17
201,216
24,517 (12%)
176,699 (88%)
Test
7
88,483
11,594 (13%)
76,889 (87%)
Tot a l
24
289,699
36,111 (12.5%)
253,588 (87.5%)
Four 1, Fargo, Forrest Gump, Legally Blond, Pulp Fiction, The God Father 1 and
The Pianist —serve as the test set for the main task which is to detect violence in Hol-
lywood movies. In terms of number of video segments, the training set (17 movies)
consists of 201,216 video segments and the test set (7 movies) consists of 88,483
video segments. Table 11.1 presents the main characteristics of the dataset in more
detail. The movies of the training and test sets were selected in such a manner that
both training and test data contain movies of variable violence levels (extreme to
none). On average, in both datasets, around 12.5% of segments are annotated as
violent.
The ground truth of the Hollywood dataset was generated by nine human asses-
sors, partly by developers and partly by potential users. Violent movie segments are
annotated at the frame level. Automatically generated shot boundaries with their
corresponding key frames are also provided for each movie. A detailed description
of the Hollywood dataset and the ground truth generation are given in [ 9 ]. For the
generalization task which is to detect violence in short web videos, the ground truth
was created by several human assessors 4 who followed the subjective definition of
violence as explained in Sect. 11.1 . A detailed description of the Web video dataset
and the ground truth generation are given in [ 31 ].
11.4.2 Experimental Setup
We employed theMIRToolbox v1.4 5 to extract theMFCC features (13-dimensional).
Frame sizes of 40ms without overlap are used to align with the 25 fps frames. The
Matlab toolbox 6 provided by Uijlings et al. [ 32 ] was used to extract dense HoG and
HoF features. Features are extracted as explained in Sect. 11.3 .
We employed the SPAMS toolbox 7 in order to compute sparse codes which
are used for the generation of the mid-level audio and visual representations.
4 Annotations were made available by Fudan University , Vietnam University of Science ,and Te ch -
nicolor .
5 https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/ .
6 http://homepages.inf.ed.ac.uk/juijling/index.php#page=software/ .
7 http://spams-devel.gforge.inria.fr/ .
 
Search WWH ::




Custom Search