Automatic Prediction of Perceptual Video Quality: Recent Trends and Research Directions - High-Quality Visual Experience

Information Technology Reference

In-Depth Information

the same amount of acuity. For example, stare at any one word in this sentence and

then try (without moving your eye) to read the beginning of this paragraph. You

will notice that even though the word that you are staring at is extremely clear, as

you move away from the word under focus, you start loosing resolution. This is re-

ferred to as foveation . If you haven't thought about this before, it may come as a

surprise, since the world seems sharp in daily life. This is because the eye performs

an efficient engineering solution (given the contraints). The HVS is designed such

that the when viewing at a scene, the eye makes rapid movements called saccades

interleaved with fixations. Fixations, as the name suggests, refers to the process of

looking at a particular location for an extended period of time. Little to no informa-

tion is gathered during a saccade and most information is gathered during a fixation.

Using this strategy of eye movements where the region of maximum visual acu-

ity (fovea) is placed at one location for a short period of time, and then moved to

another, the HVS constructs a 'high resolution' map of the scene.

The reason why we described the process of foveation in some detail is because

for HD videos, foveated video coding can help reduce bandwidth while maintaining

picture quality. Indeed, foveation driven video coding is an active area of research

[19].

VQA systems which seek to emulate the HVS generally model the first stage

of processing using a point-spread-function (PSF) to mimic the low-pass response

of the human eye. The responses from the receptors in the eye are fed to the retinal

ganglion cells. These are generally modeled using center-surround filters, since gan-

glion cells have been shown to possess on-center off-surround structure [2]. Similar

models are used for the LGN. The next stage of the HVS is area V1. The neurons

in V1 have been shown to be sensitive to direction, orientation, scale and so on. A

multi-scale, multi-orientation decomposition is generally used to mimic this. Better

models for V1 involve using multi-scale Gabor filterbanks [9]. Since we are con-

cerned with video, we skip areas V2 and V4 and move on to area V5/MT. This area

is responsible for processing motion information. Motion estimates are of great im-

portance for the human since they are used for depth perception, judging velocities

of oncoming objects and so on. The engineering equivalent of this region is estimat-

ing optical flow [20] from frames in a video. A coarser approximation is block-based

motion estimation [21]. As we shall see most NR VQA algorithms do not use this

information and currently perform only a frame-based spatial computation.

In the HVS, the responses from MT/V5 are further sent to higher levels of the

brain for processing. We do not discuss them here. The interested reader is referred

to [2] for details.

We have now seen how the human visual system works and how algorithms that

seek to evaluate quality akin to a human observer are evaluated. We also listed some

considerations for HD video. Having said that, one should note that any NR/RR

VQA technique that is proposed can be used for quality assessment of HD video.

Additional considerations for HD may improve performance of these algorithms.

In the rest of this chapter we shall discuss RR and NR algorithms for VQA. All

discussed algorithms unless otherwise stated do not utilize color information - i.e.,

High-Quality Visual Experience

Search WWH ::

Custom Search

Home