Automatic Prediction of Perceptual Video Quality: Recent Trends and Research Directions - High-Quality Visual Experience

Information Technology Reference

In-Depth Information

extracted on a frame-by-frame basis and the mean is utilized for processing. A set of

test videos and multivariate data analysis [33] are used to compute a feature matrix

F . A multiplicative signal correction (MSC) is performed using linear regression to

account for correlations between features to obtain F . Further, partial least squares

regression (PLSR) is used to map the feature vectors onto subjective ratings. Up to

this point the method described is NR. However, the authors use a quality estimate

from the original video in order to improve NR VQA performance thus creating

a RR VQA algorithm. The authors demonstrate high correlation with human per-

ception. The use of the original video for the NR to RR transition is non-standard.

Further, the innovativeness of the algorithm hinges on the use of multivariate data

analysis, since the features used have been previously proposed in literature.

Neural Network based RR VQA. Le Callet et. al. proposed a time-delay neural

network (TDNN) [34] based RR VQA index in [35]. The algorithms follows the

general description of an RR algorithm, with extracted features borrowed from pre-

vious works - including power of frame differences, blocking and frequency content

measures. Their main contribution is to utilize a TDNN to perform a temporal inte-

gration of these indicators without specifying a particular form for temporal pool-

ing. A small test to evaluate performance is undertaken and decent performance is

demonstrated.

Foveation-based RR VQA. Meng et. al. proposed an algorithm based on features

extracted from spatio-temporal (ST) regions from a video for HD RR VQA [36].

The features extracted and the ST regions are based on the ideas proposed by Wolf

and Pinson [29]. Extracted features from the original video are sent to the receiver

over an ancillary channel carrying RR information. Based on the fact that the hu-

man perceives regions within the fovea with higher visual acuity (a fact that is very

pertinent for HD video), the authors divide the video into foveal, parafoveal and

peripheral regions, where the ST regions are computed with increasing coarseness.

The authors claim that the use of these different regions increases performance,

however, analysis of performance is lacking.

Quality Aware Video. Extending the work in [37] for images, Hiremath et. al.

proposed an algorithm based on natural video statistics for RR VQA in [38].

A video is divided into a group of pictures (GOP) and each frame in the GOP

is decomposed using a steerable pyramid [39] (an overcomplete wavelet trans-

form). Subbands at same orientation and scale but from different frames are then

aligned to obtain H ( s , p , t ),where s is the scale, p is the orientation (transla-

tion factor for the wavelet) and t represents the frame. The authors then compute

L 2 ( s , p )=

1) n n log H ( s , p , t + n

2

n =0 (

t ). The histogram of L 2 appears to be

peaked at zeros with heavy tails and this is fitted with a four parameter logistic

function. The four parameters of the fit and the KL divergence [40] between the

fit and the actual distribution for each subband in each GOP form the RR features.

Further, marginal distributions in each subband is fitted using a generalized Gaus-

sian model , which are additional RR features. The RR features are embedded in the

−

Δ

∑

High-Quality Visual Experience

Search WWH ::

Custom Search

Home