Automatic Prediction of Perceptual Video Quality: Recent Trends and Research Directions - High-Quality Visual Experience

Information Technology Reference

In-Depth Information

Video Quality Assessment based on Data Hiding. Farias et. al. proposed a VQA

algorithm based on data hiding in [27]. Even though the authors claim that the

method is NR, a method that incorporates any information at the source end which is

utilized at the receiver end for QA, as per our definition, is RR VQA. The proposed

algorithm is based on the technique of water-marking. At the source end, a 8

8

block Discrete Cosine Transform (DCT) of the frame is performed. A binary mask

is multiplied by an uncorrelated pseudo-random noise matrix, which is rescaled and

added to the medium frequency coefficients from the DCT. A block is selected for

embedding only if the amount of motion (estimated using a block motion estimation

algorithm) exceeds a certain threshold ( T mov ). At the receiver end an inverse process

is performed and the mark is extracted. The measure of degradation of the video is

then the total squared error between the original mask and the retrieved mask. The

authors do not report statistical measures of performance as discussed before.

×

Data hiding in perceptually important areas. Carli et. al. proposed a block-based

spread-spectrum method for RR VQA in [28]. The proposed method is similar to

that in [27], and only differs in selecting where to place this watermark. Regions of

perceptual importance are computed using motion information, contrast and color.

The watermark is embedded only in those areas that are perceptually important,

since degradation in these areas are far more significant. A single video at different

bit-rates is used to demonstrate performance.

Limitations to the use of watermarking include the fact that the use of squared

error (which is generally computed at the receiver as a measure of quality) does

not relate to human perception, and that the degradation of a watermark may not be

proportional to (perceptual) video degradation.

4.2

Other Techniques

Low Bandwidth RR VQA. Using features proposed by the authors in [29], Wolf

and Pinson developed a RR VQA model in [30]. A spatio-temporal (ST) region con-

sisting of 32 pixels

1 second is used to extract three features. Further

a temporal RR feature which is essentially a difference between time-staggered ST

regions is also computed. At the receiver the same set of parameters are extracted

and then a logarithmic ratio or an error ratio between the (thresholded) features is

computed. Finally a Minkowski pooling is undertaken to form a quality score for

the video. The authors claim that the added RR information contributes only about

10 kbits/s of information.

×

32 pixels

×

Multivariate Data Analysis based VQA. Oelbaum and Diepold utilized a multi-

variate data analysis approach, where the HVS is modeled as a black-box, with some

input features [31, 32]. The output of this box is the visual quality of the video. The

authors utilize previously proposed features for NR IQA including blur , blocking

and video 'detail'. They also extract noise and predictability based on simple tech-

niques. Edge, motion and color continuity form the rest of the features. Features are

High-Quality Visual Experience

Search WWH ::

Custom Search

Home