Digital Signal Processing Reference
In-Depth Information
Fig. 4.12 Structured Prediction Models. ( a ): Hidden conditional random field (HCRF). ( b ): Con-
ditional random field (CRF). ( c ): Hidden Markov Model (HMM)
X
x M , with each x m representing a local state observation along
the HCRF structure. Different decision stages of aforementioned three structured
prediction models are employed to detect an event. For the HMM, the query se-
quence will be tested and the highest likelihood of the HMM provides the final
decision in event detection. On the other hand in the CRF model, since each state
variable Y
=
x 1 ,
x 2 ,...,
x m ,...,
requires a label as Fig. 4.12 b shows, a majority-rule voting scheme in
which the most event labels along the Y sequence decide the event result. For the
HCRF model depicted in Fig. 4.12 a, a multiclass training process recognizing all
classes at the same time is adopted. Therefore, a detected event with the highest
probability is considered as the final result for the query sequence.
(
t
)
4.4.1.3
Experiments and Results
In the following, experimental results are presented to justify the properties of the
proposed generic framework, specifically using a relatively large-scale video col-
lection including 23 genres with a total of 145 h gathered by the authors, named
as 23-sports dataset. All the video clips have the same length of 167 s with a total
of 500 uniformly sampled frames at a sampling rate of 3 frames per second. This
dataset is composed of 3122 clips. In training, 1,198 clips are used, in which a sub-
set of 46 clips (2 clips per sport) are used in codebook generation with a total of
3,112,341 SIFT points. In testing, the other 1,924 clips are selected.
In this experiment, the task on basketball score event detection is investigated
by employing this labeled video sequence. Two-staged coarse-to-fine scheme is
adopted with firstly detecting scoreboard information change introduced by [ 39 ].
By adopting this technique, an entry point of an interesting event is located. How-
ever, this coarse detection only provides a static frame based rough estimation as an
entry point. Since scoreboard information not only appears in score events, but also
in time-out events or intermission events, individual frame based detection with-
out temporal structured information cannot provide robust and satisfactory result.
Therefore, a fine tuning process in finalizing detection is adopted to ensure that
the query video truly conveys the score event as its semantic theme. The proposed
HCRF model is deployed as such process after the first stage coarse detection. Ex-
perimental results of using this HCRF model are compared with CRF and HMM
baselines.
Search WWH ::




Custom Search