Graphics Reference
In-Depth Information
Fig. 6.11 Efficient
implementation of inverse
transform of a block with
non-zero coefficients in only
the K
K 1D IDCT
along columns
KxK
K low frequency
sub-block. Shaded regions
denote the regions that can
contain non-zero coefficients.
Only K 1D IDCTs are
required along columns
NxN
N 1D IDCT
along rows
Tabl e 6. 5 Arithmetic
operation counts for HEVC
two-dimensional inverse
transforms
N
K
O mult
O add
4
4
48
64
8
8
352
448
8
4
132
228
16
16
2,752
3,200
16
8
1,032
1,512
16
4
420
820
32
32
21,888
23,808
32
16
8,208
10,320
32
8
3,400
5,320
32
4
1,476
3,060
In general, the number of multiplications can be reduced approximately by a
factor of ( N / K ) 2 for the first stage and a factor of ( N / K ) for the second stage.
Tab le 6.5 shows the number of arithmetic operations for various values of N and K .
Note that the majority of the arithmetic operations listed in Table 6.5 can be
efficiently implemented using SIMD instructions since the operations are matrix
multiply operations. For example, for an 8 8 inverse transform implementation,
( 6.20 ) can be efficiently implemented on a 4-way SIMD processor in 4 cycles v/s
16 cycles on a processor without SIMD acceleration. Software performance using
SIMD acceleration on various Intel processor architectures for the 8 8, 16 16,
and 32 32 transform sizes are provided in [ 3 , 11 ].
Only the Even-Odd decomposition of the inverse transform has been described in
this subsection. However, the Even-Odd decomposition idea can be used to reduce
the complexity of the forward transform too. The article [ 6 ] presents analysis of
both the forward and the inverse core transform in more details. It also describes
hardware sharing by the application of property 4 of Sect. 6.2.1 (smaller transforms
being embedded in larger transforms).
 
Search WWH ::




Custom Search