Information Technology Reference
In-Depth Information
extracted on a frame-by-frame basis and the mean is utilized for processing. A set of
test videos and multivariate data analysis [33] are used to compute a feature matrix
F . A multiplicative signal correction (MSC) is performed using linear regression to
account for correlations between features to obtain F . Further, partial least squares
regression (PLSR) is used to map the feature vectors onto subjective ratings. Up to
this point the method described is NR. However, the authors use a quality estimate
from the original video in order to improve NR VQA performance thus creating
a RR VQA algorithm. The authors demonstrate high correlation with human per-
ception. The use of the original video for the NR to RR transition is non-standard.
Further, the innovativeness of the algorithm hinges on the use of multivariate data
analysis, since the features used have been previously proposed in literature.
Neural Network based RR VQA. Le Callet et. al. proposed a time-delay neural
network (TDNN) [34] based RR VQA index in [35]. The algorithms follows the
general description of an RR algorithm, with extracted features borrowed from pre-
vious works - including power of frame differences, blocking and frequency content
measures. Their main contribution is to utilize a TDNN to perform a temporal inte-
gration of these indicators without specifying a particular form for temporal pool-
ing. A small test to evaluate performance is undertaken and decent performance is
demonstrated.
Foveation-based RR VQA. Meng et. al. proposed an algorithm based on features
extracted from spatio-temporal (ST) regions from a video for HD RR VQA [36].
The features extracted and the ST regions are based on the ideas proposed by Wolf
and Pinson [29]. Extracted features from the original video are sent to the receiver
over an ancillary channel carrying RR information. Based on the fact that the hu-
man perceives regions within the fovea with higher visual acuity (a fact that is very
pertinent for HD video), the authors divide the video into foveal, parafoveal and
peripheral regions, where the ST regions are computed with increasing coarseness.
The authors claim that the use of these different regions increases performance,
however, analysis of performance is lacking.
Quality Aware Video. Extending the work in [37] for images, Hiremath et. al.
proposed an algorithm based on natural video statistics for RR VQA in [38].
A video is divided into a group of pictures (GOP) and each frame in the GOP
is decomposed using a steerable pyramid [39] (an overcomplete wavelet trans-
form). Subbands at same orientation and scale but from different frames are then
aligned to obtain H ( s , p , t ),where s is the scale, p is the orientation (transla-
tion factor for the wavelet) and t represents the frame. The authors then compute
L 2 ( s , p )=
1) n n log H ( s , p , t + n
2
n =0 (
t ). The histogram of L 2 appears to be
peaked at zeros with heavy tails and this is fitted with a four parameter logistic
function. The four parameters of the fit and the KL divergence [40] between the
fit and the actual distribution for each subband in each GOP form the RR features.
Further, marginal distributions in each subband is fitted using a generalized Gaus-
sian model , which are additional RR features. The RR features are embedded in the
Δ
 
Search WWH ::




Custom Search