Database Reference
In-Depth Information
Event detection is the third and final quest with two preceding tasks, video genre
categorization and semantic view type classification. By accomplishing these three
tasks, event detection can be achieved with minimum domain knowledge and
partially labeled data. Although we perform our methods on sports video, the
generic nature makes the proposed framework valid in evaluating other video
consortia.
The novelty of this framework lies in the following three aspects:
1. Domain knowledge-free local descriptors are extracted using a homogeneous
process. The BoW model is used to build a histogram-based distribution to
represent video clips. The BoW based video representation using local features
is the natural selection for generically processing videos due to its domain
knowledge-free properties.
2. An unsupervised classifier with homogeneous process is proposed. This choice
of method is because that unlabeled data takes the major portion of all digital
content. Thus, an automatic and systematic process can be deployed towards a
large-scale dataset. Since sports videos have well defined semantic view types
from their production characteristics, local features combined with the BoW
model is a perfect candidate in view classification. Such a combination has also
been proven successful in computer vision and object recognition. Therefore, a
probabilistic latent semantic analysis (PLSA)-based method for semantic view
classification is preferred due to its unsupervised nature and applicability to the
BoW model.
3. A structured prediction model is adopted for taking labeled middle-level agents
as input to achieve high-level semantics. This choice is because that sports videos
have distinguishable temporal patterns often consisting of sequences of middle-
level agents. In our work, since semantic view types have been classified in
part (2), an appropriate method is to take the view results as input and achieve
semantic event detection. Therefore, hidden conditional random field (HCRF) is
introduced as a rational choice. The significance of the HCRF is its generalized
modeling, which resides in both the relaxation of the Markov property and
incorporation with hidden states of the conditional random field (CRF) modeling.
In the following, an overview of the proposed system is first presented with
a flowchart, followed by video representation using the BoW model and low-
level genre categorization. Then, the proposed techniques are introduced, including
unsupervised learning for middle-level view classification and HCRF for high-
level event detection. Experimental results are then provided to demonstrate the
effectiveness of the proposed method.
9.1.1
Overview
This section provides an overview from a holistic perspective as illustrated in
Fig. 9.1 . The input video is analyzed systematically using a generic and sequential
framework. This video is interpreted in a way such that the result from a preceding
Search WWH ::




Custom Search