Graphics Reference
In-Depth Information
seen by a decoder. It consists of inverse transform, inverse quantization, intra/inter
prediction reconstruction, and loop filters (deblocking and SAO filters). Note that
the reconstruction core may share the same hardware with the prediction core, or
retrieve the results from the prediction core. It does not need significant additional
cost as a standalone decoder.
Finally, the bitstream core performs entropy coding and writes out the final
bitstream. In HEVC, the entropy coding is Context-Adaptive Binary Arithmetic
Coding (CABAC). Note that SAO parameters should be encoded in the bitstream
and so SAO parameter derivation should be done before CABAC encoding stage.
11.2.2
CTU Processing Order
The CTU processing order in the encoder pipeline will affect the reference data
bandwidth and CABAC throughput. There are primarily two modules that are
impacted by the CTU processing order: CABAC and reference memory subsystem.
Change in CTU processing order will alter the data input order in CABAC. Due
to the probability updates in the CABAC, the encoder and decoder must perform
entropy coding on the CTUs in the same order. Thus the CTUs in the encoder can
only be entropy coded in a defined order as determined by the HEVC specification
(e.g. raster scan in slices or tiles). On the other hand, a change in the CTU processing
order also alters the behavior of the reference memory subsystem. If the CTUs in
the given processing order have better data locality, then the external bandwidth for
reference frames access is lower.
11.2.2.1
Pipeline Granularity
In considering the precedence constraint of CABAC input data, there are two options
for pipeline arrangement of CABAC. The first option is to put CABAC in the
CTU pipeline. The CABAC engine must process the next CTU data generated by
prediction engine strictly in order. The throughput requirement for CABAC with
CTU-level pipelining is peak binary symbol (bin) rate per CTU. If such a throughput
cannot be reached, the whole pipeline is stalled waiting for CABAC to complete
thereby hurting the overall performance. In the second option, the CABAC is placed
in a separate frame-level pipeline. Input data for CABAC is stored externally at
first. After the whole frame passes through prediction and reconstruction, CABAC
starts coding. This enables the prediction engine and the CABAC engine to process
the CTUs in a different order within each frame. The throughput requirement for
CABAC with frame-level pipelining is peak bin rate per frame, or equivalently
average bin rate per CTU for the peak frame. This is generally much lower than the
one with CTU-level pipelining. With this arrangement, the CTU processing orders
in CABAC stage and in the other stages are independent, at the cost of extra external
bandwidth.
Search WWH ::




Custom Search