Graphics Reference
In-Depth Information
11.2.2.2
Parallel Processing
The HEVC standard requires that the context probability updates in the CABAC
occur in the raster scan order. Raster scan order can be done with a single CABAC
engine or multiple CABAC engines by using multiple slices/tiles within a given
frame. A higher coding efficiency can be achieved with a single CABAC; however
as only a single CTU can be processed at a time, and the throughput is limited.
The CABAC needs to reach the peak bin rate per CTU, which can be quite high.
Alternatively, multiple CTUs can be processed in parallel using slices/tiles. This
comes at a cost of reduced coding efficiency since redundancy cannot be removed
across slices/tiles. If only a few slices/tiles are used, then the coding loss can be quite
low. Note that wavefront parallel processing, a new feature in HEVC, can also be
used by the CABAC to encode multiple CTU lines in parallel, with a lower coding
penalty than slices/tiles.
11.2.2.3
Data Locality
Another CTU scanning order that can be used in the encoder is the zigzag scanning
order wherein the prediction core and the reconstruction core operate on the CTUs in
the zigzag scanning order. However, the CABAC encoding will still need to happen
in the raster scanning order in order to comply with the HEVC standard. For zigzag
scanning order, the data locality among horizontal CTUs and vertical CTUs is better
than raster scan which only has good data locality among horizontal CTUs. Zigzag
scanning order cooperates well with CABAC frame-level pipelining discussed in
Sect. 11.2.2.1 . The difference is in the reference memory subsystem. Since data
locality among vertical CTUs is better, zigzag scanning order performs better when
the total on-chip memory size for reference frames is limited. Note that tiles, a new
feature in HEVC, also offers better vertical and horizontal data locality by doing
raster scan order within rectangular regions with widths smaller than the total frame
width.
To summarize, the CTU processing order should be considered along with the
various configurations of the parallel CABAC and the reference memory subsystem.
The best choice is a trade-off among bit rate increase, area cost, throughput, and
bandwidth.
11.3
Inter Prediction
In inter prediction, the temporal redundancy is reduced through motion estimation.
Motion estimation compares the current prediction unit (PU) with the spatially
neighboring PUs in the reference frames, and chooses the one with the least
difference to the current PU. The displacement between the current PU and the
matching PU in the reference frames is signaled using a motion vector. The per-pixel
Search WWH ::




Custom Search