Database Reference
In-Depth Information
Chapter 9
Scalable Video Genre Classification
and Event Detection
Abstract This chapter focuses on a systematic and generic approach which is
experimented on scalable video genre classification and event detection. The system
aims at the event detection scenario of an input video with an orderly sequential
process. Initially, domain-knowledge independent local descriptors are extracted
homogeneously from the input video sequence. Then the video representation is
created by adopting a Bag-of-word (BoW) model. The video's genre is firstly
identified by applying the k-nearest neighbor (k-NN) classifiers on the initially
obtained video representation. Various dissimilarity measures are assessed and
evaluated analytically. Then, at the high-level event detection, a hidden conditional
random field (HCRF) structured prediction model is utilized for interesting event
detection. The input of this event detection relies on middle-level view agents
in characterizing each frame of video sequence into one of four view groups,
namely closed-up-view, mid-view, long-view and outer-field-view. Unsupervised
probabilistic latent semantic analysis (PLSA) based approach is employed at the
histogram-based video representation to achieve these middle-level view groups.
The framework demonstrates the efficiency and generality in processing voluminous
video collection and achieves various tasks in video analysis. The affectiveness of
the framework is justified by extensive experimentation. Results are compared with
benchmarks and state of the art algorithms. Limited human expertise and effort is
involved in both domain-knowledge independent video representation and annota-
tion free unsupervised view labeling. As a result, such a systematic and scalable
approach can be widely applied in processing massive videos generically.
9.1
Introduction
The bag-of-words (BoW) model and its application in image classification have
been used in various aspects of video analysis. Because of its robustness in
matching semantic objects using local descriptors, the BoW concept has been
used in video object reoccurrence detection [ 231 , 232 ], semantic shot detection
[ 233 , 234 ] and grouping [ 235 ], and object-based video retrieval [ 236 , 237 ]. Some
other representative works in video analysis adopted BoW models with feature
tracking along the temporal course, including matching semantically similar videos
built by local features using spatiotemporal volumes [ 238 ]; content-based video
Search WWH ::




Custom Search