Digital Signal Processing Reference
In-Depth Information
Y is the luma component. Cb and Cr are the chroma components correlated to blue
color difference ( Y
R ), respectively. All the luma
components forms a grayscale image. Human visual system is less sensitive to the
chroma parts in an image. If we sub-sample the chroma parts by 2 horizontally and
vertically (e.g. average the neighboring 2
B ) and red color difference ( Y
2 chroma pixels), there is almost no per-
ceptual quality degradation. With removal of this kind of perceptual redundancy by
chroma subsampling, information is reduced by 2 (from 1
×
(
Y
)+
1
(
Cb
)+
1
(
Cr
)=
3
1
4 (
1
4 (
to 1
5). Chroma subsampling is widely adopted in video
coding standards. However, in emergent HDTV applications, people try to reserve
all the color information to provide more vivid videos.
(
Y
)+
Cb
)+
Cr
)=
1
.
2.2
Prediction
The second functional block is prediction. Prediction is usually the most
computation-intensive part in current video coding standards. This unit tries to find
the similarity inside a video sequence. Prediction techniques can be categorized
into temporal prediction and spatial prediction. Temporal prediction explores the
temporal similarity between consecutive frames. Without scene change, the objects
in a video are almost the same. The consecutive images are only a little different due
to object movement. Therefore, we can predict the current frame with the previous
frame well. Temporal prediction is also called inter prediction or inter-frame
prediction. Spatial prediction explores the spatial similarity between neighboring
pixels. For a region with a smooth texture, neighboring pixels are very similar.
Therefore, each pixel can be well predicted by the surrounding ones. Spatial
prediction is also called intra prediction or intra-frame prediction.
Simple examples of prediction are shown in Fig. 3 . With prediction, each pixel
in the current frame is subtracted from its corresponding predictor. For temporal
prediction in Fig. 3 b , the predictor for each pixel in the current frame is the pixel at
the same location of the previous frame as shown in Eq. ( 2 ) .
Predictor x , y , t =
Pixel x , y , t 1
(2)
Pixel x , y , t and Predictor x , y , t are respectively the values of original and predicted
pixels in the coordinate (x, y) of the frame at time t. The coordinate (0, 0) is set as
the upper-left corner of an image. For spatial prediction in Fig. 3 c , the predictor is
the average of the upper and left pixels as shown in Eq. ( 3 ) .
Pixel x 1 , y , t +
Pixel x , y 1 , t
=
Predictor x , y , t
(3)
2
As we can see, most of the pixel difference values (also called residues) after
prediction are close to zero. Spatial prediction is better for videos with less texture
like the Fo re m a n sequence. On the other hand, temporal prediction is better for
videos with less motion like the Weather sequence. If prediction is accurate, the
residues are almost zeros. In this condition, the entropy of the image is low and thus
entropy coding can achieve good coding performance.
 
 
 
 
Search WWH ::




Custom Search