Advances of MPEG Scalable Video Coding Standard - Intelligent Multimedia Data Hiding

Cryptography Reference

In-Depth Information

into multiple motion layers. Each layer records the motion vectors with a

specified accuracy. The lowest of these layers denotes only a rough representa-

tion of the motion vectors. The higher layers are used to refine the accuracy.

Different layers are coded independently so that the motion information can

be truncated at the layer boundary.

Due to the mismatch between the truncated motion information and the

residual data, the schemes with scalable motion information have drifting

errors. As a result, a linear model is proposed in [26] to provide a better trade-

off between the scalable representation and the rate-distortion performance.

3.3.4 The (2D+t+2D) Structure

The 2D+t structure is suggested to reduce the drawbacks of the t+2D struc-

ture such as inaccurate motion vectors at lower spatial resolutions and drifting

errors. Fig. 3.2 (c) shows a typical 2D+t+2D scheme. The 2D+t structure is

also given where the second spatial decomposition is omitted. The first 2D

spatial transform is usually a multi-level dyadic wavelet transform and is

called the pre-spatial decomposition or transform. In this structure, MCTF

is applied to each spatial sub-band generated by the pre-spatial transform.

One drawback is that the motion estimation on the low-resolution images is

not as good as the high resolution motion estimation. This is because only

low-resolution references are used.

Based on the above observations, a second spatial wavelet decomposition,

called post-spatial decomposition (or transform) is added as shown in Fig.

3.2(c) [22, 23]. The VidWav evaluation software is able to produce various

combinations of pre- and post-spatial decompositions. Three examples are

shown in Fig. 3.18. The first example, Fig. 3.18 (a), is a conventional t+2D

scheme outputs with 3 temporal levels. The resulting four temporal sub-bands

are t-LLL, t-LLH, t-LH, and t-H. A 2-level or 3-level post-spatial decompo-

sition is then applied depending on the temporal sub-band. Since the t-H

sub-band contains high-pass signals, it is reasonable to choose a smaller band

split. Fig. 3.18 (b) shows a two-level pre-spatial decomposition. Only the low-

est spatial band is further split by the post-spatial transforms. Fig. 3.18 (c)

demonstrates a possible combination based on a similar concept when the

pre-spatial transform has only two levels. This structure provides a lot of flex-

ibility in trading the temporal with the spatial coding e ciency. For example,

it enables the inter-layer prediction between the lower and higher spatial res-

olutions. This is an idea adopted by the AVC-based SVC. More sophisticated

schemes in the 2D+t+2D category are also suggested as an addition to the

VidWav software such as the Spatial-Temporal tool (STool). Here the inter-

layer prediction is applied to the lower spatial band after MCTF [24, 25].

3.3.5 Enhanced Motion Compensated Filtering

Another problem in the conventional t+2D structure is that it produces image

artifacts on low-pass temporal filtered images due to incorrect motion vectors.

Search WWH ::

Custom Search

Home