Database Reference
In-Depth Information
Table 10.6 Ground truth and training/testing data used for video classification via
the SVM-based fusion model
Number
Number
Number
Number
of instances
of movies
of training samples
of testing
Type of concept
in database
with concept
(positive, negative)
samples
Love scene
66
3
(22,78)
6,000
Music video
41
4
(13, 87)
6,000
Fighting
413
3
(137, 163)
6,000
Ship crashing
201
1
(67, 134)
6,000
Dance party
48
1
(16, 84)
6,000
Table 10.7 Recognition rate obtained by the SVM based fusion model
Type of concept
Accuracy (%)
False positive rate (%)
False negative rate (%)
Love scene
90.97
8.91
19.70
Music video
91.03
9.03
0
Fighting
84.68
25.65
14.55
Ship crashing
91.81
7.54
26.87
Dance party
99.68
0.30
2.08
Average
91.63
10.29
12.64
concept. This showed, however, that 73.13 % of all relevant videos were correctly
classified. Moreover, the system attained the lowest false negative rate of 0 % and
2 % for the detection of 'Music Video' and 'Dance Party,' respectively. For such
concepts, we observed that the audio features extracted from video clips contributed
highly to the effectiveness of the classifier. In addition, the consistency of the visual
scenes in the video clips representing 'Dance Party', as well as the music in the
audio enabled the classifier to achieve close to 100 % classification accuracy.
The ground truth in Table 10.6 may be used to study the generalization capabili-
ties of the SVM-based fusion model, through examining the properties of the video
test set. The table shows the number of instances of each concept in the database,
and the number of different movies where each concept exists. This data shows
that three of the concepts existed in more than one movie (i.e., Love Scene , Music
Video , Fighting ). From the results discussed above, given these concepts, the system
can classify relevant video clips correctly, although they are from different movies.
Thus, we can see that this learning system can attain generalization capabilities to
some degree. Furthermore, this work aims at characterizing semantic concepts in
terms of perceptual features providing the experimental database. These concepts
may not be as good generalizations as the ones described by textual descriptors.
It is well known that the number of positive and negative examples should not
differ much for training SVM in order to avoid classification errors. As noted
from the results, positive samples were more important than negative samples for
conducting effective training. Here, the performance of the classifier was studied
 
Search WWH ::




Custom Search