Decoder Hardware Architecture for HEVC - High Efficiency Video Coding (HEVC) - page 316

Graphics Reference

In-Depth Information

Fig. 10.9

Interpolation process for a pixel at a fractional location x

D

1=4; y

D

3=4

Table 10.5

Example costs for interpolating a block of pixels

Block type

Generic

Y64 64

Y16 16

U4x4

Parameters Block size

w

h

64

64

16

16

4

4

Filter size

n

C

1 taps

8 taps

8 taps

4 taps

Costs

Reference pixels

. w

C

n/

.h

C

n/ 71

71 (23 %) 23

23 (106 %) 7

7 (206 %)

Horizontal interps. w

.h

C

n/

64

71 (11 %) 16

23 (43 %)

4

7 (75 %)

8 (0 %)

Values in brackets denote overhead over the block size. Costs are for uni-prediction only. For

bi-prediction, all the costs are doubled

Vertical interps.

w

h

64

64 (0 %)

16

16 (0 %)

8

a block of pixels, we see that smaller pixel blocks have a proportionately higher

overhead in the number of reference pixels and number of horizontal interpolations.

To reduce the worst case overhead, 4 4 PUs are not allowed by the standard and

8 4=4 8 PUs are allowed to use only uni-prediction.

Compared to H.264/AVC, HEVC uses

1. Larger PUs which require fewer interpolations per pixel but more on-chip SRAM

2. More varied PU sizes which increase complexity of control logic

3. Longer interpolation filters which require more datapath logic and more refer-

ence pixels

Reference frames may be stored in off-chip DRAM for HD and larger picture

sizes, or in on-chip SRAM for smaller sizes. At a PU level, it is observed that

reference pixels of adjacent PUs have significant overlap. Due to this spatial locality,

fetching the reference pixels into a motion-compensation (MC) cache helps reduce

the latency and power required to access external DRAM and large on-chip SRAMs.

Considering this, a top-level architecture (showing only the data-path) for an HEVC

inter-prediction engine would look like Fig. 10.10 .

The Dispatch module generates the position and size of the reference pixel block

according to the decoded motion vectors (MVs). The MC Cache will send read

requests to reference frame buffer over the direct-memory-access (DMA) bus for

cache misses. When all the reference pixels are present in the MC cache, the Fetch

module will fetch them from the cache for the 2-D Filter module. Note that it could

take many cycles to get data from DMA bus, due to latencies of bus arbiters, DRAM

controller, and DRAM Precharge/Activate operations.

Next Page

High Efficiency Video Coding (HEVC)

Search WWH ::

Custom Search

Home