Database Reference
In-Depth Information
Chapter 10
Audio-Visual Fusion for Film Database
Retrieval and Classification
Abstract This chapter presents the techniques for the characterization and fusion
of audio and visual content in videos, and demonstrates their applications in movie
database retrieval. In the audio domain, a study is conducted on the peaky nature
of the distribution of wavelet coefficients of an audio signal, which cannot be
effectively modeled by a single distribution. Thus, a new modeling method based
on a Laplacian mixture model is studied for analyzing audio content and extracting
audio features. The dimension of the indexed features is low, which is important
for the retrieval efficiency of the system in terms of response time. Together with
the audio feature, the visual feature is extracted by template frequency modeling.
Both features are referred to as perceptual features. Then, a learning algorithm
for audiovisual fusion is presented. Specifically, the two features are fused at
the late fusion stage and input into a support vector machine to learn semantic
concepts from a given video database. Based on the experimental results, the current
system implementing the support vector machine-based fusion technique achieves
high classification accuracy when applied to a large volume database containing
Hollywood movies.
10.1
Introduction
Content-based video retrieval methods are highly applicable to movie on demand
and movie production applications. These methods can be implemented by a
recommender system for content-based filtering to assist users in finding rele-
vant entities according to their individual preferences. A central design issue of
recommender services is in addressing how to suggest relevant, yet unknown
entities. The system based on video indexing using text descriptors usually provides
great generic and broad categories. In comparison, a perception-based descriptor
implemented by a content-based recommender system provides a more focused
scope of relevant entities. Such descriptors aggregate several different types of
modality to compute relevancy. The variety of the integrated modality allows us to
consider different relevant criteria, helping users to explore new entities. To that
Search WWH ::




Custom Search