Scalable Video Genre Classification and Event Detection - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

Chapter 9

Scalable Video Genre Classification

and Event Detection

Abstract This chapter focuses on a systematic and generic approach which is

experimented on scalable video genre classification and event detection. The system

aims at the event detection scenario of an input video with an orderly sequential

process. Initially, domain-knowledge independent local descriptors are extracted

homogeneously from the input video sequence. Then the video representation is

created by adopting a Bag-of-word (BoW) model. The video's genre is firstly

identified by applying the k-nearest neighbor (k-NN) classifiers on the initially

obtained video representation. Various dissimilarity measures are assessed and

evaluated analytically. Then, at the high-level event detection, a hidden conditional

random field (HCRF) structured prediction model is utilized for interesting event

detection. The input of this event detection relies on middle-level view agents

in characterizing each frame of video sequence into one of four view groups,

namely closed-up-view, mid-view, long-view and outer-field-view. Unsupervised

probabilistic latent semantic analysis (PLSA) based approach is employed at the

histogram-based video representation to achieve these middle-level view groups.

The framework demonstrates the efficiency and generality in processing voluminous

video collection and achieves various tasks in video analysis. The affectiveness of

the framework is justified by extensive experimentation. Results are compared with

benchmarks and state of the art algorithms. Limited human expertise and effort is

involved in both domain-knowledge independent video representation and annota-

tion free unsupervised view labeling. As a result, such a systematic and scalable

approach can be widely applied in processing massive videos generically.

9.1

Introduction

The bag-of-words (BoW) model and its application in image classification have

been used in various aspects of video analysis. Because of its robustness in

matching semantic objects using local descriptors, the BoW concept has been

used in video object reoccurrence detection [ 231 , 232 ], semantic shot detection

[ 233 , 234 ] and grouping [ 235 ], and object-based video retrieval [ 236 , 237 ]. Some

other representative works in video analysis adopted BoW models with feature

tracking along the temporal course, including matching semantically similar videos

built by local features using spatiotemporal volumes [ 238 ]; content-based video

Search WWH ::

Custom Search

Home