Encoder Hardware Architecture for HEVC - High Efficiency Video Coding (HEVC)

Graphics Reference

In-Depth Information

Modes in 64 64-sized PU are skipped in fast intra prediction. The main

advantages from large PU size are in inter mode. When the frame size grows larger,

the region in the same object and with the uniform movement can be predicted

well and encoded economically with large block size. However, large-sized PU is

less useful in intra. Practically, 64 64-sized PU is rarely chosen (less than 1 %).

Therefore, the 64 64-sized intra prediction is not important and can be removed.

With the removal of 64 64-sized intra prediction, the additional set of DCT and

SRAM for 64 64-sized intra prediction is saved.

11.4.2

Hybrid Open/Closed Loop Intra Prediction

The reference pixels in intra prediction are obtained from the previously recon-

structed neighboring PUs. In the HM software, the selected modes and CTU

partitions are decided serially by full RDO process. However, full RDO process in

hardware results inevitably in high latency because of the serial processing nature

of CABAC. This will become the primary limiting factor of hardware design. It is

hard to process all PUs serially as in HM and achieve enough processing throughput

as well. In this work, we choose to remove the dependency to avoid these problems.

Hybrid open-closed loop intra prediction algorithm [ 18 ] is useful in this case. Based

on the fact that original pixels are close to reconstructed ones at low QP, this method

uses the original pixels to replace the reconstructed boundary ones. In other words,

if the reconstructed pixels are not yet ready, the original ones are used instead. In

addition, intra/inter dependency in intra prediction is also removed in this case.

Thus, the dependency does not exist in this algorithm. By hybrid open-closed loop

scheme, intra mode for each block can be calculated in parallel without waiting

for reconstruction. With all the modifications on intra prediction, the cost is 1:03 %

BD-Rate increase in low delay P configuration.

11.5

Transform and Quantization

After the intra and inter prediction with early mode decision, the transform is

performed. The corresponding block of residual samples is obtained from the

difference between the original input samples and the predicted samples. It is then

further processed by DCT-based coding with variable transform sizes. For 4 4

luma intra-prediction residual, the DST is used instead of the DCT. After transform,

the residues are quantized and sent to entropy coding. Since the transform is rather

complex in HEVC, quantization is done in a different stage. For transform, the size

can be 4 4, 8 8, 16 16,and32 32. Each transform unit (TU) can use the large

transform size or be further subdivided into smaller transform sizes by a residual

quadtree structure. The main reason for supporting different transform sizes is to

adapt the transform to the varying space frequency characteristics of the residual

Search WWH ::

Custom Search

Home