Graphics Reference
In-Depth Information
Table 10.6
Number of horizontal interpolations for each PU type
No. of horizontal interpolations
No. of vertical interpolations
Uni/bi
directional
PU Type
per PU
per pixel
per PU
per pixel
Y16 16
Uni/bi
2 16 23
2.875
2 16 16
2
Y8 8
Uni/bi
2 8 15
3.75
2 8 8
2
Y16 4
Uni/bi
2 16 11
5.5
2 16 4
2
Y4 16
Uni/bi
2 4 23
2.875
2 4 16
2
Y8
4
Uni
8
11
2.75
8
4
1
Y4 8
Uni
4 15
1.875
4 8
1
UV8
8
Uni/bi
2
8
11
2.75
2
8
8
2
UV4
4
Uni/bi
2
4
7
3.5
2
4
4
2
UV8
2
Uni/bi
2
8
5
5
2
8
2
2
UV2
8
Uni/bi
2
2
11
2.75
2
2
8
2
UV4
2
Uni
4
5
2.5
4
2
1
UV2
4
Uni
2
7
1.75
2
4
1
Some PU types are restricted to uni-prediction while other types can use either
10.5.2
PU-Adaptive Pipelining in 2-D Filter
The 2-D Filter must handle PUs of size 16 16 and smaller for luma and chroma
which require different number of interpolations as shown in Table 10.6 .Y16 4
PU requires the most number of horizontal interpolations (5.5 per pixel) and so, for
a 2 pixel/cycle throughput, 11 horizontal filters are required. By a similar analysis,
four vertical filters are required. However, this would result in a mismatch between
the peak throughput of the horizontal filters (11 pixel/cycle) and the vertical filters.
The designer can choose to add a buffer after the horizontal filters to handle the
mismatch or match the peak throughput with 11 vertical filters.
10.5.3
TMMCM for Interpolation Filter
The 6-tap interpolation filter in H.264/AVC is easy to optimize due to its symmetry
and simple coefficients [ 1 ]. However, HEVC uses longer 8-tap and 4-tap filters
for luma and chroma coefficients respectively, and the filter coefficients are also
more complex. In [ 6 ], a 1-D luma filter design with 16 adders and a 2-D filter
reuse scheme for sub-block 4 4 are proposed. A 1-D filter design using only
13 adders is also possible by unifying the luma and chroma filters into one single
design and optimizing it with time-multiplexed multiple-constant multiplication
(TMMCM). TMMCM is similar to MCM seen in Sect. 10.4 on Inverse Transform.
However, exactly one of the MCM outputs is needed every clock cycle and this
allows further optimizations by placing multiplexers within the MCM adder tree.
One such TMMCM optimization is explained in some detail next.
 
Search WWH ::




Custom Search