Video Quality Analysis for Concert Video Mashup Generation - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

to automatically generate such a combined video, called mashup ,withahigh-

quality content. A multiple-camera recording of a concert provides different view-

ing angles from the eyes of the audience, creating a lively experience. However,

the qualities of these videos are inconsistent and often low as they are recorded

usually by non-professionals using hand-held cameras.

We start with a system based on our earlier work that automatically synchro-

nizes a multiple-camera recording, described in [1]. The synchronization is nec-

essary for seamless continuity between the consecutive audio-visual segments. We

use the audio features: fingerprints and onsets to find synchronization offsets among

the recordings. The idea is that during a concert, multiple cameras record the same

audio at least for a short duration even though they might be pointing at different

objects. The method requires a minimum of 3 seconds of common audio between

the recordings. It is robust against signal degradations and computes synchroniza-

tion offsets with a high precision of 11.6 ms. We ensure, also manually, that all the

recordings used in mashups generation are accurately synchronized.

In this paper, we describe a method for evaluating video signal quality, then

selecting high-quality segments in order to facilitate automated generation of

mashups. To this end, we identify different factors describing video quality, such

as blockiness and shakiness. We measure these factors applying different con-

tent analysis techniques and compute the final quality by combining the mea-

sured factor values. The quality measurement is performed and tested in the

mashups generated from non-professional multiple-camera concert recordings

from YouTube.

2 Video Quality Analysis

The quality metrics known from video compression like mean square error or

peak signal to noise ratio are not applicable to our problem statement. This is

because there is no information available about the actual scene or the camera

settings that can be used as a reference for estimating the signal quality. There-

fore, we employ a no-reference , also called blind quality assessment method,

which estimates the image quality based on objective measures of different fea-

tures that influence the perception of quality.

Prior works on no-reference image quality estimation are done in different con-

texts such as removing artifacts in home videos [2], developing perceptual quality

models [3,4], summarizing home videos [5] and estimating network performance

in a real-time video transmission [6]. In [2] the lighting and shaking artifacts

in home videos are first detected, measured and then removed. The quality of

a JPEG compressed image is estimated in [4] according to the blockiness and

blurriness measured in the image, while in [3] according to the edge sharpness,

random noise level, ringing artifacts and blockiness. In [5], quality of a home

video is measured according to spatial features: infidelity, brightness, blurriness,

orientation and temporal features: jerkiness, instability. The features are mea-

sured not in every frame but in a temporal video segment. In [6], video quality

is measured based on the spatial distortions and temporal activities along the

frames.

Advanced Concepts for Intelligent Vision Systems

Search WWH ::

Custom Search

Home