Graphics Reference
In-Depth Information
algorithm for determining the order of macroblock processing, given their mul-
tiple spatial dependencies. Another approach of block-level parallelism is given
by the wavefront scheduling approach [ 40 ], where macroblocks are grouped in
a wavefront-like manner to ensure that a sufficient number of macroblocks are
available, as dictated by the spatial dependencies between adjacent macroblocks.
At the same time, all macroblocks belonging to the same “wavefront” can be
processed concurrently. Furthermore, macroblocks of different pictures can be
processed in parallel provided that the temporal dependencies due to motion-
compensated prediction are handled correctly [ 28 ]. Entropy decoding, however,
can only be parallelized at the slice level and therefore it has to be decoupled
from macroblock or CTU reconstruction. Although this approach can scale up
to multi-core architectures, it has some limitations too. First, the decoupling of
entropy decoding and reconstruction increases the memory usage. Furthermore,
this strategy only reduces the decoding time of a picture in the reconstruction
stage but not in the entropy decoding stage. Consequently, a single-threaded
entropy decoding step itself may be the bottleneck and the limiting factor of the
overall throughput.
In order to overcome the limitations of the parallelization strategies employed
in H.264 j MPEG-4 AVC, HEVC provides VCL-based coding tools that are
specifically designed to enable processing on high-level parallel architectures. Two
new tools aiming at facilitating high-level parallel processing have been included in
the HEVC standard [ 9 , 10 , 18 ]:
￿
Wavefront Parallel Processing (WPP) : A parallel processing approach along
the wavefront scheduling principle, which is based on a partitioning of the picture
into CTU rows such that the dependencies between CTUs of different partitions,
both in terms of predictive coding and entropy coding, are preserved to a large
extent.
￿
Tiles : A picture partitioning mechanism similar to slices, which is based on a
flexible subdivision of the picture into rectangular regions of CTUs such that
coding dependencies between CTUs of different partitions are prohibited.
Both of these tools allow subdivision of each picture into multiple partitions that
can be processed in parallel. Each partition contains an integer number of CTUs that
may or may not have dependencies on CTUs of other partitions. When WPP or tiles
are enabled, typically for each partition a separate slice segment subset is used such
that the corresponding entry point offsets (in the slice segment header) indicate the
start positions of all picture partition substreams (except for the first substream) in
the slice segment. This is necessary for each core to immediately access the partition
it has been assigned to decode. More details about parallel partition access are given
in Sect. 3.3.2.3 below.
In the HEVC Main, Main10 and Main Still Picture profile, only one of the tools
can be used at the same time, although the entry point signaling design would allow
for a co-existence in future profiles.
Search WWH ::




Custom Search