Transform and Quantization - Advanced Video Coding Systems

Game Development Reference

In-Depth Information

where X and Y represent the input and output data which are generally both matrices

of size M

N , C and R denote the transform matrix applied on the column and row

vectors of X , C and R are N

×

M transform matrices, respectively.

To reconstruct the original input data, an inverse transform process is applied

essentially. For the 2-D case, the inverse transform is formulated as follows:

×

N and M

×

X

C ·

R ,

=

X

·

(5.3)

R equals the identity matrix, which is the most common

scenario, X equals X as the transformand inverse transformcan losslessly reconstruct

the input data.

By applying the transform, i.e., the mathematical operation of Eq. 5.1 or Eq. 5.2 ,

the input vector or block, which generally consists of samples that are correlated

with their neighbors, is reorganized as another vector or block of which the entities

are commonly named as transform coefficients. In the context of data coding, the

transform is designed to derive transform coefficients which are nearly uncorrelated

with each other. For jointly Gaussian random variables, which are usually utilized to

simulate natural image signals, uncorrelated random variables are also independent.

Therefore, an efficient transform process, which is able to largely de-correlate the

input sources, helps the entropy coding to be efficiently applied.

It is noted from Eq. 5.2 that, to achieve uncorrelated transform coefficients, the

transform design, i.e., C and R , is tied with the statistical distribution of the input

source X . That is, different transform design may apply for sources differently dis-

tributed. It was well known that the optimum transform for a given input data source

is Karhunen-Loève transform (KLT) (Kolmogoroff 1931 ;Loeve 1978 ), of which

the transform matrix consists of the eigenvectors of the co-variance matrix. In the

scenario of video coding, Discrete Cosine Transform (DCT) (Ahmed et al. 1974 )

becomes the most popular transform design because of its lower complexity and

higher efficiency. For complexity, due to the fixed and symmetrical transformmatrix,

DCT can be efficiently implemented with fast and low-complexity algorithms. For

efficiency in terms of de-correlation, it has been both theoretically proved and exper-

imentally confirmed that DCT approximates the optimum KLT under the first-order

Markov conditions which approximate natural imagery sources (Clarke 1981 ).

For practical implementation, DCT is further approximated by very limited

choices of transform sizes and integer point operations. Furthermore, continuous

efforts have been dedicated by researchers to exploit more flexible transform design

in recent years. As image sources are generally 2-D and uncertain, fixed size square

DCT design cannot efficiently capture the correlation of all image sources. The limi-

tation of the current DCT designmake room for further improvement of the transform

efficiency. Several novel transform designs have been proposed in the literature and

some have been successfully applied in the latest video coding standards, such as

AVS2 and High-Efficiency Video Coding (HEVC). In the next Sect. 5.2 , we will

discuss more on the detailed technical design of transform, especially for AVS2.

C and R

when both C

·

Advanced Video Coding Systems

Search WWH ::

Custom Search

Home