HEVC Transform and Quantization - High Efficiency Video Coding (HEVC)

Graphics Reference

In-Depth Information

In the H.261, MPEG-1, H.262/MPEG-2, and H.263 video coding standards, an

8-point IDCT was specified with infinite precision. To ensure interoperability and to

minimize drift between encoder and decoder implementations using finite precision,

two features were included in the standards. First, block-level periodic intra refresh

was mandatory. Second, a conformance test for the accuracy of the IDCT using a

pseudo-random test pattern was specified.

In the H.264/MPEG-4 Advanced Video Coding (AVC) standard [ 15 ], the

problem of encoder-decoder drift was solved by specifying integer valued 4 4

and 8 8 transform matrices. The transforms were designed as approximations to

the IDCT with emphasis on minimizing the number of arithmetic operations. These

transforms had large variations of the norm of the basis vectors. As a consequence

of this, non-flat default de-quantization matrices were specified to compensate for

the different norms of the basis vectors [ 20 ].

During the development of HEVC, several different approximations of the IDCT

were studied for the core transform. The first version of the HEVC Test Model

HM1 used the H.264/AVC transforms for 4 4and8 8 blocks and integer

approximation of Chen's fast IDCT [ 7 ] for 16 16 and 32 32 blocks. The HM1

inverse transforms had the following characteristics [ 23 , 28 ]:

Non-flat de-quantization matrices for all transform sizes: While acceptable for

small transform sizes, the implementation cost of using de-quantization matrices

for larger transforms is high because of larger block sizes,

Different architectures for different transform sizes: This leads to increased area

since hardware sharing across different transform sizes is difficult,

A 20-bit transpose buffer used for storing intermediate results after the first

transform stage in 2D transform: An increased transpose buffer size leads to

larger memory and memory bandwidth. In hardware, the transpose buffer area

can be significant and comparable to transform logic area [ 30 ],

Full factorization architecture requiring cascaded multipliers and intermediate

rounding for 16- and 32-point transforms: This increases data path dependencies

and impacts parallel processing performance. It also leads to increased bit width

for multipliers and accumulators (32 bits and 64 bits respectively in software).

In hardware, in addition to area increase, it also leads to increased circuit delay

thereby limiting the maximum frequency at which the inverse transform block

can operate.

To address the complexity concerns of the HM1 transforms, a matrix multipli-

cation based core transform was proposed in [ 10 ] and eventually adopted as the

HEVC core transform. The design goal was to develop a transform that was efficient

to implement in both software on SIMD machines and in hardware. Alternative

proposals to the HEVC core transform design can be found in [ 1 , 9 , 17 ].

The HEVC core transform matrices were designed to have the following

properties [ 10 ]:

Closeness to the IDCT

Almost orthogonal basis vectors

High Efficiency Video Coding (HEVC)

Search WWH ::

Custom Search

Home