Graphics Reference
In-Depth Information
In the H.261, MPEG-1, H.262/MPEG-2, and H.263 video coding standards, an
8-point IDCT was specified with infinite precision. To ensure interoperability and to
minimize drift between encoder and decoder implementations using finite precision,
two features were included in the standards. First, block-level periodic intra refresh
was mandatory. Second, a conformance test for the accuracy of the IDCT using a
pseudo-random test pattern was specified.
In the H.264/MPEG-4 Advanced Video Coding (AVC) standard [ 15 ], the
problem of encoder-decoder drift was solved by specifying integer valued 4 4
and 8 8 transform matrices. The transforms were designed as approximations to
the IDCT with emphasis on minimizing the number of arithmetic operations. These
transforms had large variations of the norm of the basis vectors. As a consequence
of this, non-flat default de-quantization matrices were specified to compensate for
the different norms of the basis vectors [ 20 ].
During the development of HEVC, several different approximations of the IDCT
were studied for the core transform. The first version of the HEVC Test Model
HM1 used the H.264/AVC transforms for 4 4and8 8 blocks and integer
approximation of Chen's fast IDCT [ 7 ] for 16 16 and 32 32 blocks. The HM1
inverse transforms had the following characteristics [ 23 , 28 ]:
￿
Non-flat de-quantization matrices for all transform sizes: While acceptable for
small transform sizes, the implementation cost of using de-quantization matrices
for larger transforms is high because of larger block sizes,
￿
Different architectures for different transform sizes: This leads to increased area
since hardware sharing across different transform sizes is difficult,
￿
A 20-bit transpose buffer used for storing intermediate results after the first
transform stage in 2D transform: An increased transpose buffer size leads to
larger memory and memory bandwidth. In hardware, the transpose buffer area
can be significant and comparable to transform logic area [ 30 ],
￿
Full factorization architecture requiring cascaded multipliers and intermediate
rounding for 16- and 32-point transforms: This increases data path dependencies
and impacts parallel processing performance. It also leads to increased bit width
for multipliers and accumulators (32 bits and 64 bits respectively in software).
In hardware, in addition to area increase, it also leads to increased circuit delay
thereby limiting the maximum frequency at which the inverse transform block
can operate.
To address the complexity concerns of the HM1 transforms, a matrix multipli-
cation based core transform was proposed in [ 10 ] and eventually adopted as the
HEVC core transform. The design goal was to develop a transform that was efficient
to implement in both software on SIMD machines and in hardware. Alternative
proposals to the HEVC core transform design can be found in [ 1 , 9 , 17 ].
The HEVC core transform matrices were designed to have the following
properties [ 10 ]:
￿
Closeness to the IDCT
￿
Almost orthogonal basis vectors
Search WWH ::




Custom Search