Graphics Reference
In-Depth Information
Table 10.1 CTU-adaptive
pipeline granularity
Coding Tree Unit
Variable-sized Pipeline Block
(CTU)
(VPB)
64
64
64
64
32
32
64
32
16
16
64
16
luma pixels and two rows of chroma pixels (per chroma component) due to the
deblocking filter's support. The size of these buffers is proportional to the width
of the picture. Further, if the picture is split into multiple tile rows, each tile row
needs a separate line buffer if the rows are to be processed in parallel. Tiles also
need column buffers to handle data dependencies between them in the horizontal
direction. Traditionally, line buffers have been implemented using on-chip SRAM.
However, for very large picture sizes, it may be necessary to store them in the denser
off-chip DRAM. This results in an area and power trade-off as communicating to
the off-chip DRAM takes much more power.
Also, off-chip DRAM is used most commonly to store the decoded picture
buffer. The variable latency to the off-chip DRAM must be considered in the system
pipeline. In particular, buffers are needed between processing blocks that talk to the
DRAM to accommodate the variable latency. Motion compensation makes the most
number of accesses to the external DRAM and a motion compensation cache is
typically used to reduce the number of accesses. With a cache, the best-case latency
for a memory access is determined by a cache hit and it can be as low as one cycle.
However, the worse-case latency, determined by a cache miss, remains more or less
unchanged thus increasing the overall variability seen by the prediction block.
To summarize, the top-level system pipeline is affected by:
1. Processing dependencies
2. Large CTU sizes
3. Large line buffers
4. Off-chip DRAM latency
10.2.1
Variable-Sized Pipeline Blocks
Compared to the all-intra or all-inter macroblocks in H.264/AVC, the Coding Tree
Units (CTU) in HEVC may contain a mix of inter and intra-coded Coding Units.
Hence, it is convenient to design the pipeline granularity to be equal to the CTU
size. If the pipeline buffers are implemented as multi-bank SRAM, the decoder
can be made power-scalable for smaller CTU sizes by shutting down the unused
banks. However, it is also possible to use the unused banks and increase the pipeline
granularity beyond the CTU size. For example, a CTU-adaptive pipeline granularity
shown in Table 10.1 is employed by [ 9 ].
 
Search WWH ::




Custom Search