Video Compression - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

Y is the luma component. Cb and Cr are the chroma components correlated to blue

color difference ( Y

R ), respectively. All the luma

components forms a grayscale image. Human visual system is less sensitive to the

chroma parts in an image. If we sub-sample the chroma parts by 2 horizontally and

vertically (e.g. average the neighboring 2

−

B ) and red color difference ( Y

−

2 chroma pixels), there is almost no per-

ceptual quality degradation. With removal of this kind of perceptual redundancy by

chroma subsampling, information is reduced by 2 (from 1

(

4 (

to 1

5). Chroma subsampling is widely adopted in video

coding standards. However, in emergent HDTV applications, people try to reserve

all the color information to provide more vivid videos.

(

2.2

Prediction

The second functional block is prediction. Prediction is usually the most

computation-intensive part in current video coding standards. This unit tries to find

the similarity inside a video sequence. Prediction techniques can be categorized

into temporal prediction and spatial prediction. Temporal prediction explores the

temporal similarity between consecutive frames. Without scene change, the objects

in a video are almost the same. The consecutive images are only a little different due

to object movement. Therefore, we can predict the current frame with the previous

frame well. Temporal prediction is also called inter prediction or inter-frame

prediction. Spatial prediction explores the spatial similarity between neighboring

pixels. For a region with a smooth texture, neighboring pixels are very similar.

Therefore, each pixel can be well predicted by the surrounding ones. Spatial

prediction is also called intra prediction or intra-frame prediction.

Simple examples of prediction are shown in Fig. 3 . With prediction, each pixel

in the current frame is subtracted from its corresponding predictor. For temporal

prediction in Fig. 3 b , the predictor for each pixel in the current frame is the pixel at

the same location of the previous frame as shown in Eq. ( 2 ) .

Predictor x , y , t =

Pixel x , y , t − 1

(2)

Pixel x , y , t and Predictor x , y , t are respectively the values of original and predicted

pixels in the coordinate (x, y) of the frame at time t. The coordinate (0, 0) is set as

the upper-left corner of an image. For spatial prediction in Fig. 3 c , the predictor is

the average of the upper and left pixels as shown in Eq. ( 3 ) .

Pixel x − 1 , y , t +

Pixel x , y − 1 , t

Predictor x , y , t

(3)

As we can see, most of the pixel difference values (also called residues) after

prediction are close to zero. Spatial prediction is better for videos with less texture

like the Fo re m a n sequence. On the other hand, temporal prediction is better for

videos with less motion like the Weather sequence. If prediction is accurate, the

residues are almost zeros. In this condition, the entropy of the image is low and thus

entropy coding can achieve good coding performance.

Signal Processing Systems

Search WWH ::

Custom Search

Home