Automatic Prediction of Perceptual Video Quality: Recent Trends and Research Directions - High-Quality Visual Experience

Information Technology Reference

In-Depth Information

with the subjective MOS be made publicly available. One such publicly available

dataset for VQA is the popular Video Quality Experts Group (VQEG) FRTV Phase-I

dataset [6] . The VQEG dataset consists of 20 reference videos, each subjected to 16

different distortions to form a total of 320 distorted videos. In [6], a study of various

algorithms was conducted on this dataset and it was shown that none of the assessed

algorithms were statistically better than peak signal-to-noise ratio (PSNR) 1 !Over

the years, many new FR VQA algorithms which perform well on this dataset have

been proposed [9, 10] . However, the VQEG dataset is not without its drawbacks.

The dataset is dated, since the report of the study was released in the year 2000.

Previous generation compression techniques such as MPEG [11] were used to pro-

duce distortions. Current generations compressions standards such as H.264/AVC

[12] exhibit different perceptual distortions and hence a database that covers the

H.264/AVC compression standard is relevant for modern systems. Further, the

perceptual separation of videos in the VQEG dataset is poor, leading to inconsis-

tent judgments for humans and algorithms. In order to alleviate many such prob-

lems associated with the VQEG dataset, researchers from the Laboratory for Image

and Video Engineering (LIVE) have created two new VQA datasets . The LIVE

databases are now available for non-commercial research purposes; information

may be found online [13, 14]. The LIVE VQA datasets include modern day com-

pression techniques such as the H.264/AVC and different channel induced distor-

tions. Descriptions of the datasets and the evaluated algorithms may be found in

[15] and [16].

Now that we have a dataset with subjective MOS and scores from an algo-

rithm, our goal is to study the correlation between them. In order to do so, Spear-

man's Rank Ordered Correlation Coefficient (SROCC) [17] is generally used [6].

SROCC of 1 indicates that the two sets of data under study are perfectly correlated.

Other measures of correlation include the Linear (Pearson's) correlation coefficient

(LCC) and the root-mean-square error (RMSE) between the objective and subjec-

tive scores. LCC and RMSE are generally evaluated after subjecting the algorithms

to a logistic function . This is to allow for the objective and subjective scores to

be non-linearly related. For eg., figure 1 shows a scatter plot between MOS scores

from the VQEG dataset and an FR VQA algorithm [18]. As one can see, the two

are definitely correlated, only that the correlation is non-linear. Transformation of

the scores using the logistic accounts for this non-linearity and hence application of

LCC and RMSE make sense. It is essential to point out that application of the lo-

gistic in no way constitutes 'training' an algorithm on the dataset (as some authors

claim). It is simply a technique that allows for application of the LCC and RMSE

as statistical measures of performance. A high value (close to 1) for LCC and a low

value (close to 0) for RMSE indicate that the algorithm performs well.

Having summarized how one would analyze a VQA algorithm, let us move on to

the human visual system whose properties are of tremendous importance for devel-

oping VQA algorithms.

1

Why PSNR is a poor measure of visual quality is described in [7] and [8].

High-Quality Visual Experience

Search WWH ::

Custom Search

Home