Video Scene Analysis: A Machine Learning Perspective - Video Segmentation and Its Applications

Digital Signal Processing Reference

In-Depth Information

Fig. 4.12 Structured Prediction Models. ( a ): Hidden conditional random field (HCRF). ( b ): Con-

ditional random field (CRF). ( c ): Hidden Markov Model (HMM)

X

x M , with each x m representing a local state observation along

the HCRF structure. Different decision stages of aforementioned three structured

prediction models are employed to detect an event. For the HMM, the query se-

quence will be tested and the highest likelihood of the HMM provides the final

decision in event detection. On the other hand in the CRF model, since each state

variable Y

=

x 1 ,

x 2 ,...,

x m ,...,

requires a label as Fig. 4.12 b shows, a majority-rule voting scheme in

which the most event labels along the Y sequence decide the event result. For the

HCRF model depicted in Fig. 4.12 a, a multiclass training process recognizing all

classes at the same time is adopted. Therefore, a detected event with the highest

probability is considered as the final result for the query sequence.

(

t

)

4.4.1.3

Experiments and Results

In the following, experimental results are presented to justify the properties of the

proposed generic framework, specifically using a relatively large-scale video col-

lection including 23 genres with a total of 145 h gathered by the authors, named

as 23-sports dataset. All the video clips have the same length of 167 s with a total

of 500 uniformly sampled frames at a sampling rate of 3 frames per second. This

dataset is composed of 3122 clips. In training, 1,198 clips are used, in which a sub-

set of 46 clips (2 clips per sport) are used in codebook generation with a total of

3,112,341 SIFT points. In testing, the other 1,924 clips are selected.

In this experiment, the task on basketball score event detection is investigated

by employing this labeled video sequence. Two-staged coarse-to-fine scheme is

adopted with firstly detecting scoreboard information change introduced by [ 39 ].

By adopting this technique, an entry point of an interesting event is located. How-

ever, this coarse detection only provides a static frame based rough estimation as an

entry point. Since scoreboard information not only appears in score events, but also

in time-out events or intermission events, individual frame based detection with-

out temporal structured information cannot provide robust and satisfactory result.

Therefore, a fine tuning process in finalizing detection is adopted to ensure that

the query video truly conveys the score event as its semantic theme. The proposed

HCRF model is deployed as such process after the first stage coarse detection. Ex-

perimental results of using this HCRF model are compared with CRF and HMM

baselines.

Video Segmentation and Its Applications

Search WWH ::

Custom Search

Home