Game Development Reference
In-Depth Information
Then the temporal quality index can be defined as follows
n 2
N
1
N
n
Q T (
i 0 ) =
1
v
v
.
(11.32)
n
=
1
enlargethispage12pt The frame level error indices for both spatial and temporal com-
ponents of MOVIE at a frame are defined as,
˃ Q S
˃ Q T
FE S t j =
FE T t j =
(
)
(
)
x , y , t j
x , y , t j
,
.
(11.33)
μ Q S
μ Q T
(
)
(
)
x
,
y
,
t j
x
,
y
,
t j
Finally, the ultimate score MOVIE is obtained by
MOVIE
=
SpatialMOVIE
×
TemporalMOVIE
,
(11.34)
FE S t j ,
1
SpatialMOVIE
=
(11.35)
j
=
1
FE T t j .
1
TemporalMOVIE
=
(11.36)
j
=
1
11.2.2.2 Spatial-Temporal Structural Information Based
Video Quality Metric
From the HVS perspective, natural videos are not random collections of pixels,
but have strong structural and statistical dependencies in both spatial domain and
temporal domain. Understanding the properties of psychological and physiological
perception of the HVS enables the development of better algorithms for video quality
assessment. In the spatial domain, the basic primitives of natural scenes perceived
by the HVS are the edges, ridges, bars, and junctions. These image primitives are
strongly structured, which means the pixel energy is centralized and arranged regu-
larly to certain orientations. Along the temporal axis, displacement of these spatial
primitives gives HVS the perception of motion. As illustrated in Fig. 11.4 , the edge
contour of a moving object would stretch out a plane along its motion trajectory
in the spatiotemporal space. In another word, the energy in the localized space-time
region would centralize to a certain orientation. Both the object edge e and its motion
trajectory v would lie in a plane which is orthogonal to this primary direction p .
Figure 11.5 illustrates the flowchart of the algorithm. The first step of the algorithm
is to collect gradient information. The 3D Sobel kernels are applied for calculating
the local gradients. Then an attention related mechanism is incorporated into our
algorithm to determine pixels need to be processed. When the salient pixels have
been selected, we construct a pair of 3D structure tensors for each salient pixel
in both the reference video and the distorted video, and then perform eigenvalue
 
Search WWH ::




Custom Search