Graphics Reference
In-Depth Information
Table 10.3
Area breakdown
for inverse transform
Logic area
(kgates)
Module
Partial transform
71
Accumulator
5
Row cache
4
FIFOs
5
Scaling C Control
19
To t a l
104
Table 10.4
Area for
different transforms. Partial
32-pt IDCT contains all the
smaller IDCTs
Logic area
(kgates)
Module
4-pt IDCT
3
Partial 8-pt IDCT
10
Partial 16-pt IDCT
24
Partial 32-pt IDCT
57
4-pt IDST C misc.
14
10.4.4
Implementation Results
Breakdown of the post-synthesis logic area at 200 MHz clock frequency in 40 nm
CMOS is given in Table
10.3
. The total area is 104 kgate of logic (in terms of 2-input
NAND gates) and 16.4 kbit of SRAM. Table
10.4
shows the combinational logic
area required for 1-D transform operations. Data-gating and zero-column skipping
can provide power reduction of 18% and throughput improvement of 27-66% as
shown in [
32
].
10.5
Inter Prediction
HEVC inter prediction uses motion vectors pointing to one reference frame (uni-
prediction) or two reference frames (bi-prediction) to predict a block of pixels. The
size of the predicted block, called Prediction Unit (PU), is determined by the Coding
Unit (CU) size and its partitioning mode. For example, a 32
32 CU with 2N
N
partitioning is split into two PUs of size 32
16,ora16
16 CU with nL
2N
partitioning is split into 4
16 and 12
16 PUs.
For luma pixels, the motion vectors for each PU have a resolution of 1/4-th pixel.
The predicted pixels at non-integer pixel positions are obtained by interpolating
between the reference pixels using an 8-tap FIR filter, first along the horizontal
direction and then along the vertical as shown in Fig.
10.9
. (In Main Profile, the
reverse order, i.e. vertical followed by horizontal also gives the same result). For
chroma, the motion vector is halved and has a 1/8-th pixel resolution computed using
a 4-tap interpolation filter. From Table
10.5
, which shows the cost of interpolating