Graphics Reference
In-Depth Information
2D-Tree Parallel IME
PU-Mode
Pre-
decision
PU-Mode
Pre-
decision
PU-Mode
Pre-
decision
Fast Intra Prediction
PU-Mode
Pre-
decision
PU-Mode
Pre-
decision
PU-Mode
Pre-
decision
FME
(16x16 CU)
FME
(32x32 CU)
FME
(64x64 CU)
8x8
DCT
16x16
DCT
32x32
DCT
8x8
DCT
16x16
DCT
32x32
DCT
High Complexity Mode Decision
Fig. 11.15
HCMD pipeline architecture block diagram
11.6.3
Hardware-Oriented Two-Step RDO Algorithm
In this section, we present a two-step mode decision flow for hardware. In the
literature, various coding tree pruning algorithms are proposed to further reduce
the full RDO numbers for computing CU depth [ 16 , 26 , 27 , 29 ]. Instead of a hard
threshold, a fast CU splitting and pruning scheme based on Bayes decision rules and
Gaussian distribution of RD-cost is proposed in [ 35 ]. For intra prediction modes,
most probable mode (MPM) is derived from neighboring blocks as alternative
candidates for full RDO to improve the mode decision quality within the limited
number of candidates from rough mode decision [ 42 ]. Intra CU depth traversal
can also be early terminated by neighboring CU mode and block size relationship
between TU and PU [ 14 ]. With these early termination methods, only candidates
with good enough costs from fast RDO will be selected to go through the full RDO
process. The final mode will be chosen from the full RDO result.
The parallelization cost per computation in CABAC hardware is much higher
than other modules. This is because most of the CABAC cost is from context
memory that changes according to the chosen mode. As a result, it is hardly
sharable. Thus, the parallelization cost required to reach the throughput is rather
high. Many fast algorithms propose to use fast RDO as the final mode decision.
In most of the previous works on H.264/AVC encoder, this method is applied with
various fast RDO algorithms. A previous low power encoder [ 9 ] in H.264/AVC
 
Search WWH ::




Custom Search