Media Compression - Scalable Continuous Media Streaming Systems

Information Technology Reference

In-Depth Information

2.3 Temporal Dependencies in Compressed Video

Video encoders typically exploit three types of redundancies to reduce the compressed video

bit-rate, namely, spatial redundancy, temporal redundancy, and entropy. Spatial redundancy

refers to the correlation between pixels within the same video frame. This is also known

as intra-frame coding as only pixels within the same video frame are used in the encoding

process. The resultant encoded video frame, commonly called the I frame, can be decoded

independently.

Temporal dependency refers to correlations between adjacent frames. As the video captures

a snapshot of a video scene periodically at, say, 25 to 30 fps, adjacent frames will likely contain

very similar visual objects, often with some displacements due to motion of the objects or the

camera. Thus, the encoder can exploit this correlation by predicting a video frame from the

neighboring frames. In MPEG, for example, this is done through the use of predictive frames

(P frames) and bi-directional predictive frames (B frames) as shown in Figure 2.4.

Specifically, beginning with an intra-coded I frame, the encoder will first predict the P frame

using a process called motion estimation. In motion estimation the encoder will search for

similar blocks of pixels in the I frame and the to-be-encoded P frame. After the search is

completed, only the displacement of the block (due to motion) and the prediction errors are

encoded to form the data for the P frame. Thus, P frames can be encoded using substantially

fewer bits than an I frame. This encoded P frame will then be used to predict the next P frame

and so on until another I frame is introduced.

In addition to P frames, a number of B frames are also introduced between a pair of anchor

frames (I or P frame). These B frames, as shown in Figure 2.4, are predicted from both anchor

frames to further reduce the resultant bit-rate. Therefore, B frames usually consume the fewest

bits compared to P frames and I frames in the same video stream. Both P and B frames are

called inter-coded frames. Note that, unlike I frames, P and B frames cannot be decoded

independently. Instead, the required anchor frames must first be decoded and then used in

decoding the inter-coded frames. This has two implications to media streaming.

First, as shown in Figure 2.4, the temporal dependencies dictate that the B frames cannot

be decoded for playback unless all two anchor frames are received and decoded. Thus, if

the media server streams out the video data according to their temporal order, the client will

need to buffer up B frames to wait for the second anchor frame to arrive before decoding for

the B frames can proceed. In practice, the video encoder often re-orders the frame sequence

according to the decoding order as shown in Figure 2.5 to reduce the client buffer requirement.

B

P

I

B

P

Figure 2.4

Temporal dependencies in compressed video

Search WWH ::

Custom Search

Home