Graphics Reference
In-Depth Information
Fig. 10.1 System pipelining for HEVC decoder. Coeff buffer saves 20 kB of SRAM by TU
pipelining. Connections to Line Buffers are omitted in the figure for clarity (see Fig. 10.3 for
details)
The Variable-sized Pipeline Block (VPB) is as tall as the CTU but its width is
fixed to 64 for a unified control flow. Also, by making the VPB larger than the
CTU (for CTU 32 32 and 16 16), motion compensation can predict a larger
block of luma pixels before predicting the chroma pixels. This reduces the number
of switches between luma and chroma memory accesses which, as explained later
in Sect. 10.6 , can have benefits on the DRAM latency.
10.2.2
Split System Pipeline
To deal with the variable latency of the cache+DRAM memory system, elastic
pipelining can be used between the entropy decoder, which sends read requests to
the cache, and prediction, which reads data from the cache. As a result, the system
pipeline can be broken into two groups. The first group contains the entropy decoder
while the second contains inverse transform, prediction and the subsequent in-loop
filters. This scheme is shown in Fig. 10.1 .
Entropy decoder uses collocated motion vectors from decoded pictures for
motion vector prediction. A separate pipeline stage, ColMV DMA is added prior
to entropy decoder to read collocated motion vectors from the DRAM. This
isolates entropy decoder from the variable DRAM latency. Similarly, an extra stage,
reconstruction DMA, is added after the in-loop filters in the second pipeline group
to write back fully reconstructed pixels to DRAM. Processing engines are pipelined
with VPB granularity within each group as shown in Fig. 10.2 . Pipelining across the
groups is explained next.
 
Search WWH ::




Custom Search