Graphics Reference
In-Depth Information
Modes in 64 64-sized PU are skipped in fast intra prediction. The main
advantages from large PU size are in inter mode. When the frame size grows larger,
the region in the same object and with the uniform movement can be predicted
well and encoded economically with large block size. However, large-sized PU is
less useful in intra. Practically, 64 64-sized PU is rarely chosen (less than 1 %).
Therefore, the 64 64-sized intra prediction is not important and can be removed.
With the removal of 64 64-sized intra prediction, the additional set of DCT and
SRAM for 64 64-sized intra prediction is saved.
11.4.2
Hybrid Open/Closed Loop Intra Prediction
The reference pixels in intra prediction are obtained from the previously recon-
structed neighboring PUs. In the HM software, the selected modes and CTU
partitions are decided serially by full RDO process. However, full RDO process in
hardware results inevitably in high latency because of the serial processing nature
of CABAC. This will become the primary limiting factor of hardware design. It is
hard to process all PUs serially as in HM and achieve enough processing throughput
as well. In this work, we choose to remove the dependency to avoid these problems.
Hybrid open-closed loop intra prediction algorithm [ 18 ] is useful in this case. Based
on the fact that original pixels are close to reconstructed ones at low QP, this method
uses the original pixels to replace the reconstructed boundary ones. In other words,
if the reconstructed pixels are not yet ready, the original ones are used instead. In
addition, intra/inter dependency in intra prediction is also removed in this case.
Thus, the dependency does not exist in this algorithm. By hybrid open-closed loop
scheme, intra mode for each block can be calculated in parallel without waiting
for reconstruction. With all the modifications on intra prediction, the cost is 1:03 %
BD-Rate increase in low delay P configuration.
11.5
Transform and Quantization
After the intra and inter prediction with early mode decision, the transform is
performed. The corresponding block of residual samples is obtained from the
difference between the original input samples and the predicted samples. It is then
further processed by DCT-based coding with variable transform sizes. For 4 4
luma intra-prediction residual, the DST is used instead of the DCT. After transform,
the residues are quantized and sent to entropy coding. Since the transform is rather
complex in HEVC, quantization is done in a different stage. For transform, the size
can be 4 4, 8 8, 16 16,and32 32. Each transform unit (TU) can use the large
transform size or be further subdivided into smaller transform sizes by a residual
quadtree structure. The main reason for supporting different transform sizes is to
adapt the transform to the varying space frequency characteristics of the residual
Search WWH ::




Custom Search