Database Reference
In-Depth Information
The HMM algorithm is also provided in Eq. ( 9.15 ) and depicted in Fig. 9.4 c.
P
(
Y
|
X
)=
P
(
X
,
Y
) /
P
(
X
)
= t
P
(
X t |
Y t ) ·
P
(
Y t |
Y t 1 )
(9.15)
The aforementioned three structured prediction models use different decision-
making schemes for the final event detection. For the HMM, the query sequence
is tested. The highest likelihood of the HMM provides the final decision in event
detection. On the other hand, in the CRF model, since each state variable Y
)
requires a label, as Fig. 9.4 b shows, a majority-rule voting scheme in which the most
event labels along the Y sequence decide the event result. For the HCRF model
depicted in Fig. 9.4 a, a multi-class training process recognizing all classes at the
same time is adopted. Therefore, a detected event with the highest probability is
considered the final result for the query sequence.
(
t
9.4
Experimental Result
In the following section, experimental results are presented to justify the properties
of the proposed generic framework, specifically using a relatively large-scale video
collection that includes 23 genres with a total of 145 h gathered by the authors and
his co-workers, named the 23-sports dataset. To our best knowledge, this dataset is
the most diverse in video genres, collected from both the internet and television. All
the video clips have the same length of 167 s with a total of 500 uniformly sampled
frames at a sampling rate of three frames per second. This dataset is composed with
3,122 clips. In training, 1,198 clips are used, in which a subset of 46 clips (2 clips
per sport) are used in codebook generation with a total of 3,112,341 SIFT points. In
testing, the other 1,924 clips are selected.
Various codebook sizes were studied at first. Then, the proposed system was
evaluated in three experiments, with a particular event detection as its ultimate
measurement: (1) genre categorization using the proposed bottom-up codebook
generation is analyzed; (2) view classification results are assessed and compared
using both supervised and unsupervised classifiers; (3) finally, the coarse-to-fine
event detection is examined by investigating the basketball score event. The validity
on the score event detection can be extended to other event scenarios with labeled
video sequences. The detailed argument can be found in Sect. 9.4.3 .
To investigate the codebook size effectiveness, a subset of the 23-sports dataset
of 14 sports was used. The clip numbers of these sports range from 70 to
106, averaging 87, while each individual clip is a uniform 167 s in length. Two
experiments were conducted on the codebook size selection for genre categorization
and view classification, respectively. For genre categorization, the average accuracy
performance of all sports as a function of different codebook sizes is shown in
Fig. 9.6 a. The plot reaches a plateau after codebook size 800, and starts to drop
Search WWH ::




Custom Search