Scalable Video Genre Classification and Event Detection - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

Event detection is the third and final quest with two preceding tasks, video genre

categorization and semantic view type classification. By accomplishing these three

tasks, event detection can be achieved with minimum domain knowledge and

partially labeled data. Although we perform our methods on sports video, the

generic nature makes the proposed framework valid in evaluating other video

consortia.

The novelty of this framework lies in the following three aspects:

1. Domain knowledge-free local descriptors are extracted using a homogeneous

process. The BoW model is used to build a histogram-based distribution to

represent video clips. The BoW based video representation using local features

is the natural selection for generically processing videos due to its domain

knowledge-free properties.

2. An unsupervised classifier with homogeneous process is proposed. This choice

of method is because that unlabeled data takes the major portion of all digital

content. Thus, an automatic and systematic process can be deployed towards a

large-scale dataset. Since sports videos have well defined semantic view types

from their production characteristics, local features combined with the BoW

model is a perfect candidate in view classification. Such a combination has also

been proven successful in computer vision and object recognition. Therefore, a

probabilistic latent semantic analysis (PLSA)-based method for semantic view

classification is preferred due to its unsupervised nature and applicability to the

BoW model.

3. A structured prediction model is adopted for taking labeled middle-level agents

as input to achieve high-level semantics. This choice is because that sports videos

have distinguishable temporal patterns often consisting of sequences of middle-

level agents. In our work, since semantic view types have been classified in

part (2), an appropriate method is to take the view results as input and achieve

semantic event detection. Therefore, hidden conditional random field (HCRF) is

introduced as a rational choice. The significance of the HCRF is its generalized

modeling, which resides in both the relaxation of the Markov property and

incorporation with hidden states of the conditional random field (CRF) modeling.

In the following, an overview of the proposed system is first presented with

a flowchart, followed by video representation using the BoW model and low-

level genre categorization. Then, the proposed techniques are introduced, including

unsupervised learning for middle-level view classification and HCRF for high-

level event detection. Experimental results are then provided to demonstrate the

effectiveness of the proposed method.

9.1.1

Overview

This section provides an overview from a holistic perspective as illustrated in

Fig. 9.1 . The input video is analyzed systematically using a generic and sequential

framework. This video is interpreted in a way such that the result from a preceding

Search WWH ::

Custom Search

Home