Graphics Reference
In-Depth Information
difference between the current PU and the matching PU in the reference frames
constitutes the prediction error (a.k.a. residue) which is transformed and quantized
and coded in the bitstream.
In comparison with H.264/AVC, inter prediction in HEVC has three major
differences: (1) larger diversity in block size, (2) high complexity mode decision
is needed to achieve sufficient coding gain, and (3) longer sub-pixel interpolation
filter. In HEVC, the PU size may range from 4 8/8 4 to 64 64. Computation
complexity for deciding the best block partition also increases considerably. To
accurately choose the best mode among such high number of possible modes, full
RDO invoking more accurate distortion and bit estimation needs to be applied.
This requires inter predictions to preserve several possible modes for later HCMD
stage. HEVC utilizes 8 or 7-tap interpolation filter for higher interpolation accuracy
compared with 6-tap in H.264/AVC. So the complexity in sub-pixel calculation is
also higher. To cope with these complexity increases, higher parallelism in hardware
is necessary. This should be achieved with moderate cost increase. In addition,
the parallelism in hardware also induces much higher memory access bandwidth.
A memory subsystem that supports high bandwidth requirement is required to make
motion estimation work properly. These issues are covered later in this section.
11.3.1
Motion Estimation
Due to the difference in the processing nature, inter prediction is usually divided
into two major modules, integer motion estimation (IME) and fractional motion
estimation (FME), corresponding to two granularity levels, the integer level and the
fractional level. IME usually performs a coarse search over the whole search region.
In this level, the parallelism requirement is high, while the accuracy requirement
is moderate. After that, FME does a fine search around the IME searched result
in sub-pixel accuracy. 8 or 7-tap interpolation filtering is required to get the pixels
in the fractional positions. Since the distortion costs among neighboring sub-pixel
candidates are similar, higher accuracy in the distortion computation is required in
order to select the best candidate. The reference architecture is shown in Fig. 11.2 .
In previous works, various architectures for variable block size motion estimation
have been compared [ 7 , 24 ]. A fast gradient-based algorithm on a parallel 2D
SAD tree with high data reuse is described in [ 10 ]. Exploration in data reuse
for motion estimation is shown in [ 13 ]. To increase parallelism, a highly parallel
inter mode decision in HEVC is achieved by dependency removal in [ 41 ]. Finally,
[ 35 ] describes how throughput requirements can be met by processing multiple
CUs in parallel, but processing the PU within each CU serially to achieve the
same sequential order as in HM. The result shows small block sizes (e.g. 4 4,
4 8, 8 4) impose significantly larger hardware, but provide only modest
improvements in coding efficiency. In addition, a search range strategy centered on
the advanced motion vector predictors (AMVP) with pre-fetch and limited search
range movement is presented.
Search WWH ::




Custom Search