Information Technology Reference
In-Depth Information
C l a s sifie r ens e mb le
Training samples
(violent &
non-violent)
classifier-1
classifier-2
Clustering
(k-means)
.
.
classifier-M
...
SVM model-1
SVM model-2
...
SVM model-M
Fig. 11.3 The generation of violence detection models with feature space partitioning
partitioning by clustering video segments of 0.6 s length in our training dataset and
learn a different model for each violence subconcept (i.e., cluster). We use two-class
SVMs in order to learn violence models. An overview of the generation of violence
detection models is presented in Fig. 11.3 .
In the learning step, the main issue is the problem of imbalanced data. This
is caused by the fact that, in the training dataset, the number of nonviolent video
shots is much higher than the number of violent ones. This phenomenon results
in the learned boundary being too close to the violent instances. Consequently, the
SVM tends to classify every sample as nonviolent. Different strategies to “push”
this decision boundary toward the nonviolent samples exist. Although more sophis-
ticated methods dealing with the imbalanced data issue have been proposed in the
literature (see [ 19 ] for a comprehensive survey), we choose, in the current frame-
work, to perform random undersampling to balance the number of violent and non-
violent samples (with a balance ratio of 1:2). This method proposed by Akbani
et al. [ 2 ] appears to be particularly adapted to the application context of our work.
In [ 2 ], different under and oversampling strategies are compared. According to the
results, SVM with the undersampling strategy provides the most significant perfor-
mance gain over standard two-class SVMs. In addition, the efficiency of the training
process is improved as a result of the reduced training data and, hence, training
is easily scalable to large datasets similar to the ones used in the context of our
work.
In the test phase, the main challenge is to combine the classification results of the
violence models. We perform a classifier selection to solve this. More specifically, we
first determine the nearest cluster to a video segment of the test set using Euclidean
distance measures. Once the classifier for the video sample is determined, the output
of the chosen model is used as the final prediction for that video sample. An overview
of the test phase of our method is presented in Fig. 11.4 .
Search WWH ::




Custom Search