Decoder Hardware Architecture for HEVC - High Efficiency Video Coding (HEVC) - page 305

Graphics Reference

In-Depth Information

Fig. 10.1 System pipelining for HEVC decoder. Coeff buffer saves 20 kB of SRAM by TU

pipelining. Connections to Line Buffers are omitted in the figure for clarity (see Fig. 10.3 for

details)

The Variable-sized Pipeline Block (VPB) is as tall as the CTU but its width is

fixed to 64 for a unified control flow. Also, by making the VPB larger than the

CTU (for CTU 32 32 and 16 16), motion compensation can predict a larger

block of luma pixels before predicting the chroma pixels. This reduces the number

of switches between luma and chroma memory accesses which, as explained later

in Sect. 10.6 , can have benefits on the DRAM latency.

10.2.2

Split System Pipeline

To deal with the variable latency of the cache+DRAM memory system, elastic

pipelining can be used between the entropy decoder, which sends read requests to

the cache, and prediction, which reads data from the cache. As a result, the system

pipeline can be broken into two groups. The first group contains the entropy decoder

while the second contains inverse transform, prediction and the subsequent in-loop

filters. This scheme is shown in Fig. 10.1 .

Entropy decoder uses collocated motion vectors from decoded pictures for

motion vector prediction. A separate pipeline stage, ColMV DMA is added prior

to entropy decoder to read collocated motion vectors from the DRAM. This

isolates entropy decoder from the variable DRAM latency. Similarly, an extra stage,

reconstruction DMA, is added after the in-loop filters in the second pipeline group

to write back fully reconstructed pixels to DRAM. Processing engines are pipelined

with VPB granularity within each group as shown in Fig. 10.2 . Pipelining across the

groups is explained next.

Next Page

High Efficiency Video Coding (HEVC)

Search WWH ::

Custom Search

Home