Information Technology Reference
In-Depth Information
the same amount of acuity. For example, stare at any one word in this sentence and
then try (without moving your eye) to read the beginning of this paragraph. You
will notice that even though the word that you are staring at is extremely clear, as
you move away from the word under focus, you start loosing resolution. This is re-
ferred to as foveation . If you haven't thought about this before, it may come as a
surprise, since the world seems sharp in daily life. This is because the eye performs
an efficient engineering solution (given the contraints). The HVS is designed such
that the when viewing at a scene, the eye makes rapid movements called saccades
interleaved with fixations. Fixations, as the name suggests, refers to the process of
looking at a particular location for an extended period of time. Little to no informa-
tion is gathered during a saccade and most information is gathered during a fixation.
Using this strategy of eye movements where the region of maximum visual acu-
ity (fovea) is placed at one location for a short period of time, and then moved to
another, the HVS constructs a 'high resolution' map of the scene.
The reason why we described the process of foveation in some detail is because
for HD videos, foveated video coding can help reduce bandwidth while maintaining
picture quality. Indeed, foveation driven video coding is an active area of research
[19].
VQA systems which seek to emulate the HVS generally model the first stage
of processing using a point-spread-function (PSF) to mimic the low-pass response
of the human eye. The responses from the receptors in the eye are fed to the retinal
ganglion cells. These are generally modeled using center-surround filters, since gan-
glion cells have been shown to possess on-center off-surround structure [2]. Similar
models are used for the LGN. The next stage of the HVS is area V1. The neurons
in V1 have been shown to be sensitive to direction, orientation, scale and so on. A
multi-scale, multi-orientation decomposition is generally used to mimic this. Better
models for V1 involve using multi-scale Gabor filterbanks [9]. Since we are con-
cerned with video, we skip areas V2 and V4 and move on to area V5/MT. This area
is responsible for processing motion information. Motion estimates are of great im-
portance for the human since they are used for depth perception, judging velocities
of oncoming objects and so on. The engineering equivalent of this region is estimat-
ing optical flow [20] from frames in a video. A coarser approximation is block-based
motion estimation [21]. As we shall see most NR VQA algorithms do not use this
information and currently perform only a frame-based spatial computation.
In the HVS, the responses from MT/V5 are further sent to higher levels of the
brain for processing. We do not discuss them here. The interested reader is referred
to [2] for details.
We have now seen how the human visual system works and how algorithms that
seek to evaluate quality akin to a human observer are evaluated. We also listed some
considerations for HD video. Having said that, one should note that any NR/RR
VQA technique that is proposed can be used for quality assessment of HD video.
Additional considerations for HD may improve performance of these algorithms.
In the rest of this chapter we shall discuss RR and NR algorithms for VQA. All
discussed algorithms unless otherwise stated do not utilize color information - i.e.,
 
Search WWH ::




Custom Search