Information Technology Reference
In-Depth Information
with the subjective MOS be made publicly available. One such publicly available
dataset for VQA is the popular Video Quality Experts Group (VQEG) FRTV Phase-I
dataset [6] . The VQEG dataset consists of 20 reference videos, each subjected to 16
different distortions to form a total of 320 distorted videos. In [6], a study of various
algorithms was conducted on this dataset and it was shown that none of the assessed
algorithms were statistically better than peak signal-to-noise ratio (PSNR) 1 !Over
the years, many new FR VQA algorithms which perform well on this dataset have
been proposed [9, 10] . However, the VQEG dataset is not without its drawbacks.
The dataset is dated, since the report of the study was released in the year 2000.
Previous generation compression techniques such as MPEG [11] were used to pro-
duce distortions. Current generations compressions standards such as H.264/AVC
[12] exhibit different perceptual distortions and hence a database that covers the
H.264/AVC compression standard is relevant for modern systems. Further, the
perceptual separation of videos in the VQEG dataset is poor, leading to inconsis-
tent judgments for humans and algorithms. In order to alleviate many such prob-
lems associated with the VQEG dataset, researchers from the Laboratory for Image
and Video Engineering (LIVE) have created two new VQA datasets . The LIVE
databases are now available for non-commercial research purposes; information
may be found online [13, 14]. The LIVE VQA datasets include modern day com-
pression techniques such as the H.264/AVC and different channel induced distor-
tions. Descriptions of the datasets and the evaluated algorithms may be found in
[15] and [16].
Now that we have a dataset with subjective MOS and scores from an algo-
rithm, our goal is to study the correlation between them. In order to do so, Spear-
man's Rank Ordered Correlation Coefficient (SROCC) [17] is generally used [6].
SROCC of 1 indicates that the two sets of data under study are perfectly correlated.
Other measures of correlation include the Linear (Pearson's) correlation coefficient
(LCC) and the root-mean-square error (RMSE) between the objective and subjec-
tive scores. LCC and RMSE are generally evaluated after subjecting the algorithms
to a logistic function . This is to allow for the objective and subjective scores to
be non-linearly related. For eg., figure 1 shows a scatter plot between MOS scores
from the VQEG dataset and an FR VQA algorithm [18]. As one can see, the two
are definitely correlated, only that the correlation is non-linear. Transformation of
the scores using the logistic accounts for this non-linearity and hence application of
LCC and RMSE make sense. It is essential to point out that application of the lo-
gistic in no way constitutes 'training' an algorithm on the dataset (as some authors
claim). It is simply a technique that allows for application of the LCC and RMSE
as statistical measures of performance. A high value (close to 1) for LCC and a low
value (close to 0) for RMSE indicate that the algorithm performs well.
Having summarized how one would analyze a VQA algorithm, let us move on to
the human visual system whose properties are of tremendous importance for devel-
oping VQA algorithms.
1
Why PSNR is a poor measure of visual quality is described in [7] and [8].
Search WWH ::




Custom Search