Block Structures and Parallelism Features in HEVC - High Efficiency Video Coding (HEVC)

Graphics Reference

In-Depth Information

Thus, typically the maximum speed-up factor using p processing units is p.Since

the actual speed-up is influenced by various characteristics such as synchronization

between threads, memory access or memory characteristics, an optimized imple-

mentation would avoid main memory access, assuming a hierarchical memory

structure, as much as possible and instead rely on cache memory access, as the

latter typically has much faster access times compared to the main memory albeit at

smaller memory sizes.

Different parallelization techniques have been introduced for improved utiliza-

tion of computational resources in the implementation of video coding standards. In

the following only the most important ones are mentioned:

Picture-Level Parallelization : Picture-level parallelism consists of processing

multiple pictures at the same time provided that the temporal dependencies for

motion-compensated prediction are satisfied. Picture-level parallelism is often

sufficient for multi-core systems with a few cores. Because it is relatively

simple to implement and does not incur coding efficiency losses, it has become

the state-of-the art for software-based implementations of H.264 j MPEG-4

AVC. However, picture-level parallelism has a number of limitations. First, the

parallelization scalability is determined by the lengths of the motion vectors

and/or the size of the underlying group of pictures (GOP). Second, the workload

of each core may be imbalanced because the picture encoding/decoding times

can vary significantly. Finally, picture-level parallelism increases the processing

frame rate but does not improve latency.

Slice-Level Parallelization : In HEVC and H.264 j MPEG-4 AVC, each picture

can be partitioned into slices, as described in Sect. 3.3.1 . All (regular) slices

within a picture are independent from each other except for potential depen-

dencies regarding cross-slice border in-loop filtering. Therefore, slices can be

used for parallel processing. Slice-level parallelism, however, has a number of

disadvantages. Although slices are completely independent from each other in

terms of prediction, transform and entropy coding, in-loop filtering may be

applied across slice boundaries. For H.264 j MPEG-4 AVC it may be required

to perform deblocking of the complete picture using a single processing unit,

whereas HEVC, in principle, allows in-loop filtering to be performed on CTU

rows in parallel. Moreover, as already mentioned above, multiple slices reduce

the coding efficiency significantly due to the restrictions of in-picture prediction

and entropy coding across slice boundaries. Due to these disadvantages, exploit-

ing slice-level parallelism is only advisable when the number of slices per picture

is strictly limited [ 33 ].

Block-Level Parallelization : In hardware-based implementations of H.264 j

MPEG-4 AVC, for example, a macroblock-level pipeline is very widely used.

This kind of block-level parallelization technique is based on using heterogenous

processing cores, where one core is dedicated for entropy coding, one for in-

loop filtering, one for intra prediction and so on. In this way, macroblocks

will be processed concurrently on the different cores. Note, however, that

efficient parallel processing of macroblocks may require an elaborate scheduling

High Efficiency Video Coding (HEVC)

Search WWH ::

Custom Search

Home