HEVC Transform and Quantization - High Efficiency Video Coding (HEVC)

Graphics Reference

In-Depth Information

Fig. 6.11 Efficient

implementation of inverse

transform of a block with

non-zero coefficients in only

the K

K 1D IDCT

along columns

KxK

K low frequency

sub-block. Shaded regions

denote the regions that can

contain non-zero coefficients.

Only K 1D IDCTs are

required along columns

NxN

N 1D IDCT

along rows

Tabl e 6. 5 Arithmetic

operation counts for HEVC

two-dimensional inverse

transforms

N

K

O mult

O add

4

48

64

8

352

448

8

4

132

228

16

2,752

3,200

16

8

1,032

1,512

16

4

420

820

32

21,888

23,808

32

16

8,208

10,320

32

8

3,400

5,320

32

4

1,476

3,060

In general, the number of multiplications can be reduced approximately by a

factor of ( N / K ) 2 for the first stage and a factor of ( N / K ) for the second stage.

Tab le 6.5 shows the number of arithmetic operations for various values of N and K .

Note that the majority of the arithmetic operations listed in Table 6.5 can be

efficiently implemented using SIMD instructions since the operations are matrix

multiply operations. For example, for an 8 8 inverse transform implementation,

( 6.20 ) can be efficiently implemented on a 4-way SIMD processor in 4 cycles v/s

16 cycles on a processor without SIMD acceleration. Software performance using

SIMD acceleration on various Intel processor architectures for the 8 8, 16 16,

and 32 32 transform sizes are provided in [ 3 , 11 ].

Only the Even-Odd decomposition of the inverse transform has been described in

this subsection. However, the Even-Odd decomposition idea can be used to reduce

the complexity of the forward transform too. The article [ 6 ] presents analysis of

both the forward and the inverse core transform in more details. It also describes

hardware sharing by the application of property 4 of Sect. 6.2.1 (smaller transforms

being embedded in larger transforms).

High Efficiency Video Coding (HEVC)

Search WWH ::

Custom Search

Home