Graphics Reference
In-Depth Information
In order to give the developers of encoder and decoder products as much
freedom as possible while ensuring interoperability between devices of different
manufacturers, the video coding standards only specify the bitstream syntax and
the result of the decoding process, 1 while the encoding process is left out of scope.
However, the coding efficiency of a particular encoder depends to a large extent
on the encoding algorithm that is used for determining the values of the syntax
elements written to the bitstream. This includes the selection of coding modes,
associated prediction parameters, quantization parameters as well as quantization
indices for the transform coefficients. A conceptually simple and very effective class
of encoding algorithms are based on Lagrangian bit-allocation [ 31 , 39 , 43 ]. With
these approaches, the used coding parameters p are determined by minimizing a
weighted sum of the resulting distortion D and the associated number of bits R over
the set
A
of available choices,
p D arg min
8
D.p/ C R.p/:
(3.1)
p
2 A
The Lagrange parameter is a constant that determines the trade-off between
distortion D and the number of bits R and thus both the quality of the reconstructed
video and the bit rate of the bitstream.
The coding efficiency that a hybrid video coding standard can achieve depends
on several design aspects such as the used interpolation filters for sub-sample
interpolation, the efficiency of the entropy coding, or the employed in-loop filtering
techniques. However, the main source of improvement from one standard generation
to the next is typically given by the increased number of supported possibilities for
coding a picture or a block of samples. This includes, for example, an increased
precision of motion vectors, a larger flexibility for choosing the coding order
of pictures, an extended set of available reference pictures, an increased number
of intra prediction modes, an increased number of motion vector predictors, an
increased number of supported transform sizes as well as an increased number of
block sizes for motion-compensated prediction.
In the following, we investigate the set of choices that are supported for
partitioning a picture into blocks for motion-compensated prediction, intra-picture
prediction, and transform coding. If we consider a given block of samples, different
subdivisions into blocks used for prediction or transform coding are associated with
different trade-offs between distortion and rate. When we subdivide a block into
multiple subblocks and select the best prediction parameters for each subblock, we
typically decrease the prediction error energy, but increase the bit rate required for
transmitting the prediction parameters. Whether a subdivision is advantageous in
rate-distortion sense depends on the actual block of samples. By extending the set
of supported subdivision modes we typically increase the bit rate that is required for
1 The standards specify an example decoding process. A decoder implementation is conforming
to a standard if it produces the same output pictures as the specified decoding process. For older
standards such as MPEG-2 Video, an accuracy requirement for the inverse transform is specified.
Search WWH ::




Custom Search