Graphics Reference
In-Depth Information
full RDO process. This is done in proposed HCMD hardware. HCMD hardware
consists of SSD unit and CABAC bit rate estimator. The two parts are discussed in
Sects. 11.6.4.1 and 11.6.4.2 , respectively.
11.6.4.1
SSD Cost Unit
Since SSD is done only on final mode decision in HCMD and does not require
high throughput as SAD/SATD unit in prediction stage, direct implementation is
feasible. Consider the following case as an example. If PU-level early mode decision
is applied, six modes need to pass through HCMD process. Assume we are encoding
8K UHDTV sequence at 30 fps. The clock rate is set to be 300 MHz. CTU size is
64 64. For each CTU, there will be about 1,200 cycles to process. We may use four
multiplier and sum units per mode, and SSD computation for six modes are done in
parallel. The total cycles required are 1,024 cycle and the throughput is acceptable.
11.6.4.2
CABAC Bit Rate Estimator
CABAC is the only choice for entropy coding in HEVC because of its coding
performance. However, the CABAC has strong sequential dependency and is
difficult to parallelize; it also has high implementation cost. In HCMD, multiple
instances of CABAC are used for bit estimation. Large area is required if bit
estimation is done with CABAC. The major cause of the area is that CABAC
uses high number of contexts to attain accurate probability estimation. Each context
stores one {state, MPS} pair in memory. The huge amount of {state, MPS} memory
results in large cost in state stage. Since each CABAC needs to trace state for each
mode, multiple instances of CABAC state storage is required. State stage occupies
most area in CABAC. This is not efficient for implementation.
There are some other methods that use regression-based or table-based methods
for prediction. The bit rate can be predicted accurately by table lookup [ 21 ].
JCTVC-G763 [ 1 ] proposes a table-based CABAC bit counting algorithm. Fractional
numbers of bits ranging from 0:008 to 7:497 bits are accumulated according to
current state. However, it still relies on the states of CABAC. Thus, it still needs to
traverse the states of CABAC and requires separate storage for states of each HCMD
mode. The sequential nature of CABAC also poses a limit on the throughput of these
bit counters that require CABAC states.
To reduce the cost from CABAC bit estimation, we need to resolve the state
issue. We show two hardware-oriented algorithms: bypass-based bit estimation and
Context-Fixed Binary Arithmetic Coding (CFBAC) algorithm. For the bypass-based
bit estimation, we do not actually do CABAC. We only sum up the bit count output
by the binarization process (this is equivalent to coding the bins in bypass mode).
Since we do not pass the bitstream to the arithmetic encoder, this technique does
not require the state to be stored. Thus, state memory cost is saved in this case.
For the CFBAC algorithm, we aim to reduce the state memory cost by sharing the
Search WWH ::




Custom Search