Database Reference
In-Depth Information
than the other two model types (i.e. pattern-recognition model and semantic event
model). In addition, among the state event models, most methods utilize middle-
level semantic agents. In our work, the adopted four-category view type definition is
one of the most popular classification schemes in literature. Last and most important,
the input of our event detection model is a sequence of labeled views which is the
result of a domain knowledge-free method (either PLSA or SVM), using generic
video representation. With better accuracy achieved by the proposed HCRF-based
model than baselines HMM- and CRF-based models, the performance should be
maintained with other labeled sequences which could form various event scenarios.
Moreover, utilizing sequences labeled by the middle-level agents as input, is also
popular among peers' work with state event models [ 275 , 276 , 278 , 279 ].
9.5
Summary
This chapter focuses on scalable video genre classification and event detection
with the help of middle-level view agent. We introduce the BoW model, with
its incorporation of unsupervised learning algorithms, in analyzing large-scale
video dataset generically and systematically. Three video tasks are investigated
in a coherent and sequential order. After processing all data indifferently at the
feature extraction stage using domain knowledge-free local SIFT descriptors, video
sequences are represented by utilizing compact and concise BoW model. Then, a
systematic scheme is employed for interesting event detection, by taking the video
sequence as query. In this framework, after its genres identified using a k-NN
classifier, the query video is evaluated by a semantic view assignment as the second
stage using the PLSA model. Both genre identification and view classification
tasks utilize the initially processed video representation as input, and unsupervised
algorithms as classifiers. Finally in the third task, the interesting event is detected
by feeding the view labels into an HCRF-structured prediction model.
Overall, this framework demonstrates the efficiency and generality in processing
voluminous data from a large-scale video collection and achieves various tasks
in video analysis. The effectiveness of the framework is justified by extensive
experimentation and results are compared with benchmarks and state-of-the-art
algorithms. As a conclusion, with little human expertise and effort involvement
in both domain knowledge-independent video representation and annotation-free
unsupervised view labeling, the proposed generic and systematic method using the
BoW model is promising in processing videos, and has the potential for even larger
and more diverse datasets.
Search WWH ::




Custom Search