Graphics Reference
In-Depth Information
Table 10.3 Area breakdown
for inverse transform
Logic area
(kgates)
Module
Partial transform
71
Accumulator
5
Row cache
4
FIFOs
5
Scaling C Control
19
To t a l
104
Table 10.4 Area for
different transforms. Partial
32-pt IDCT contains all the
smaller IDCTs
Logic area
(kgates)
Module
4-pt IDCT
3
Partial 8-pt IDCT
10
Partial 16-pt IDCT
24
Partial 32-pt IDCT
57
4-pt IDST C misc.
14
10.4.4
Implementation Results
Breakdown of the post-synthesis logic area at 200 MHz clock frequency in 40 nm
CMOS is given in Table 10.3 . The total area is 104 kgate of logic (in terms of 2-input
NAND gates) and 16.4 kbit of SRAM. Table 10.4 shows the combinational logic
area required for 1-D transform operations. Data-gating and zero-column skipping
can provide power reduction of 18% and throughput improvement of 27-66% as
shown in [ 32 ].
10.5
Inter Prediction
HEVC inter prediction uses motion vectors pointing to one reference frame (uni-
prediction) or two reference frames (bi-prediction) to predict a block of pixels. The
size of the predicted block, called Prediction Unit (PU), is determined by the Coding
Unit (CU) size and its partitioning mode. For example, a 32 32 CU with 2N N
partitioning is split into two PUs of size 32 16,ora16 16 CU with nL 2N
partitioning is split into 4 16 and 12 16 PUs.
For luma pixels, the motion vectors for each PU have a resolution of 1/4-th pixel.
The predicted pixels at non-integer pixel positions are obtained by interpolating
between the reference pixels using an 8-tap FIR filter, first along the horizontal
direction and then along the vertical as shown in Fig. 10.9 . (In Main Profile, the
reverse order, i.e. vertical followed by horizontal also gives the same result). For
chroma, the motion vector is halved and has a 1/8-th pixel resolution computed using
a 4-tap interpolation filter. From Table 10.5 , which shows the cost of interpolating
 
Search WWH ::




Custom Search