Encoder Hardware Architecture for HEVC - High Efficiency Video Coding (HEVC)

Graphics Reference

In-Depth Information

seen by a decoder. It consists of inverse transform, inverse quantization, intra/inter

prediction reconstruction, and loop filters (deblocking and SAO filters). Note that

the reconstruction core may share the same hardware with the prediction core, or

retrieve the results from the prediction core. It does not need significant additional

cost as a standalone decoder.

Finally, the bitstream core performs entropy coding and writes out the final

bitstream. In HEVC, the entropy coding is Context-Adaptive Binary Arithmetic

Coding (CABAC). Note that SAO parameters should be encoded in the bitstream

and so SAO parameter derivation should be done before CABAC encoding stage.

11.2.2

CTU Processing Order

The CTU processing order in the encoder pipeline will affect the reference data

bandwidth and CABAC throughput. There are primarily two modules that are

impacted by the CTU processing order: CABAC and reference memory subsystem.

Change in CTU processing order will alter the data input order in CABAC. Due

to the probability updates in the CABAC, the encoder and decoder must perform

entropy coding on the CTUs in the same order. Thus the CTUs in the encoder can

only be entropy coded in a defined order as determined by the HEVC specification

(e.g. raster scan in slices or tiles). On the other hand, a change in the CTU processing

order also alters the behavior of the reference memory subsystem. If the CTUs in

the given processing order have better data locality, then the external bandwidth for

reference frames access is lower.

11.2.2.1

Pipeline Granularity

In considering the precedence constraint of CABAC input data, there are two options

for pipeline arrangement of CABAC. The first option is to put CABAC in the

CTU pipeline. The CABAC engine must process the next CTU data generated by

prediction engine strictly in order. The throughput requirement for CABAC with

CTU-level pipelining is peak binary symbol (bin) rate per CTU. If such a throughput

cannot be reached, the whole pipeline is stalled waiting for CABAC to complete

thereby hurting the overall performance. In the second option, the CABAC is placed

in a separate frame-level pipeline. Input data for CABAC is stored externally at

first. After the whole frame passes through prediction and reconstruction, CABAC

starts coding. This enables the prediction engine and the CABAC engine to process

the CTUs in a different order within each frame. The throughput requirement for

CABAC with frame-level pipelining is peak bin rate per frame, or equivalently

average bin rate per CTU for the peak frame. This is generally much lower than the

one with CTU-level pipelining. With this arrangement, the CTU processing orders

in CABAC stage and in the other stages are independent, at the cost of extra external

bandwidth.

Search WWH ::

Custom Search

Home