Database Reference
In-Depth Information
10.5
Summary
The chapter focuses on the content characterization of audio and visual data, and
the learning models of support vector machines for the audio-visual fusion. First,
we introduce the Laplacian mixture model for audio analysis. The shape of the
wavelet coefficient distribution is modeled by a low-dimensional vector of the
model parameters. The index vector addresses the global characteristics of the audio
content. Since videos contain mixed types of audio content, music speech, sound
effects, and noise, the global characterization is effective for these types of mixed
audio sources.
Video data involves both audio and visual signals to convey semantic meanings.
The application of the audiovisual fusion technique provides the most accurate
means of content analysis compared to methods analyzing either the visual or audio
signal alone. Template frequency modeling for visual content analysis, together with
the statistical technique based on the Laplacian mixture model for audio analysis
capture effectively the spatio-temporal information. We demonstrate the application
for characterizing semantic concepts in movie clips from a large video library. The
audiovisual fusion model through a support vector machine training process can
adaptively construct a decision function for classification of videos according to a
given concept.
Search WWH ::




Custom Search