Decoder Hardware Architecture for HEVC - High Efficiency Video Coding (HEVC)

Graphics Reference

In-Depth Information

Table 10.1 CTU-adaptive

pipeline granularity

Coding Tree Unit

Variable-sized Pipeline Block

(CTU)

(VPB)

64

32

64

32

16

64

16

luma pixels and two rows of chroma pixels (per chroma component) due to the

deblocking filter's support. The size of these buffers is proportional to the width

of the picture. Further, if the picture is split into multiple tile rows, each tile row

needs a separate line buffer if the rows are to be processed in parallel. Tiles also

need column buffers to handle data dependencies between them in the horizontal

direction. Traditionally, line buffers have been implemented using on-chip SRAM.

However, for very large picture sizes, it may be necessary to store them in the denser

off-chip DRAM. This results in an area and power trade-off as communicating to

the off-chip DRAM takes much more power.

Also, off-chip DRAM is used most commonly to store the decoded picture

buffer. The variable latency to the off-chip DRAM must be considered in the system

pipeline. In particular, buffers are needed between processing blocks that talk to the

DRAM to accommodate the variable latency. Motion compensation makes the most

number of accesses to the external DRAM and a motion compensation cache is

typically used to reduce the number of accesses. With a cache, the best-case latency

for a memory access is determined by a cache hit and it can be as low as one cycle.

However, the worse-case latency, determined by a cache miss, remains more or less

unchanged thus increasing the overall variability seen by the prediction block.

To summarize, the top-level system pipeline is affected by:

1. Processing dependencies

2. Large CTU sizes

3. Large line buffers

4. Off-chip DRAM latency

10.2.1

Variable-Sized Pipeline Blocks

Compared to the all-intra or all-inter macroblocks in H.264/AVC, the Coding Tree

Units (CTU) in HEVC may contain a mix of inter and intra-coded Coding Units.

Hence, it is convenient to design the pipeline granularity to be equal to the CTU

size. If the pipeline buffers are implemented as multi-bank SRAM, the decoder

can be made power-scalable for smaller CTU sizes by shutting down the unused

banks. However, it is also possible to use the unused banks and increase the pipeline

granularity beyond the CTU size. For example, a CTU-adaptive pipeline granularity

shown in Table 10.1 is employed by [ 9 ].

High Efficiency Video Coding (HEVC)

Search WWH ::

Custom Search

Home