Information Technology Reference
In-Depth Information
to automatically generate such a combined video, called mashup ,withahigh-
quality content. A multiple-camera recording of a concert provides different view-
ing angles from the eyes of the audience, creating a lively experience. However,
the qualities of these videos are inconsistent and often low as they are recorded
usually by non-professionals using hand-held cameras.
We start with a system based on our earlier work that automatically synchro-
nizes a multiple-camera recording, described in [1]. The synchronization is nec-
essary for seamless continuity between the consecutive audio-visual segments. We
use the audio features: fingerprints and onsets to find synchronization offsets among
the recordings. The idea is that during a concert, multiple cameras record the same
audio at least for a short duration even though they might be pointing at different
objects. The method requires a minimum of 3 seconds of common audio between
the recordings. It is robust against signal degradations and computes synchroniza-
tion offsets with a high precision of 11.6 ms. We ensure, also manually, that all the
recordings used in mashups generation are accurately synchronized.
In this paper, we describe a method for evaluating video signal quality, then
selecting high-quality segments in order to facilitate automated generation of
mashups. To this end, we identify different factors describing video quality, such
as blockiness and shakiness. We measure these factors applying different con-
tent analysis techniques and compute the final quality by combining the mea-
sured factor values. The quality measurement is performed and tested in the
mashups generated from non-professional multiple-camera concert recordings
from YouTube.
2 Video Quality Analysis
The quality metrics known from video compression like mean square error or
peak signal to noise ratio are not applicable to our problem statement. This is
because there is no information available about the actual scene or the camera
settings that can be used as a reference for estimating the signal quality. There-
fore, we employ a no-reference , also called blind quality assessment method,
which estimates the image quality based on objective measures of different fea-
tures that influence the perception of quality.
Prior works on no-reference image quality estimation are done in different con-
texts such as removing artifacts in home videos [2], developing perceptual quality
models [3,4], summarizing home videos [5] and estimating network performance
in a real-time video transmission [6]. In [2] the lighting and shaking artifacts
in home videos are first detected, measured and then removed. The quality of
a JPEG compressed image is estimated in [4] according to the blockiness and
blurriness measured in the image, while in [3] according to the edge sharpness,
random noise level, ringing artifacts and blockiness. In [5], quality of a home
video is measured according to spatial features: infidelity, brightness, blurriness,
orientation and temporal features: jerkiness, instability. The features are mea-
sured not in every frame but in a temporal video segment. In [6], video quality
is measured based on the spatial distortions and temporal activities along the
frames.
 
Search WWH ::




Custom Search