Hardware Reference
In-Depth Information
3.4.2.5
Memory Management
The DMA read and DMA write depicted in Fig. 3.84 could cause a delay in the
macroblock-level pipeline processing. To prevent this, we have to improve the efficiency
of the image-data transfer especially in the reference-image read. To achieve an
efficient 2D data transfer, an address transformation scheme is introduced in the mem-
ory management unit for VPU and other media IPs in order to avoid a page miss in the
external SDRAM.
Most video codec standards require small, submacroblock level 2D data transfer
for reference reads in the decoding mode. Without using any particular techniques
to achieve such transfers, this will result in a page miss at every line. The penalty for
a page miss, which is around ten cycles or more, requires a high proportion of
memory bandwidth. Efficiency in the 2D data transfer is thus critically important.
With embedded systems such as mobile applications, in which various kinds of
software are executed, it is not feasible to adopt a particular form of memory alloca-
tion such as using a bank-interleave operation for each pixel line.
To avoid page misses, tile-linear address translation (TLAT) [ 76 ] is introduced
between the video codec and the on-chip interconnect. Figure 3.85a shows the
TLAT circuits and memory allocation in the virtual address (VADR) and physical
address (PADR) space. The lower-order bits of the VADR issued by the video codec
are rearranged into the corresponding PADR. As shown in Fig. 3.85b , 32 × 32 tile
access from the video codec is mapped to linear addressing in the PADR space.
When the lower address of the VADR is defined as VADR [m: 0], the PADR is
described as follows:
PADR [ m : TB+VB+HB] = VADR [ m : TB+VB+HB];
PADR [TB+VB+HB-1: TB+VB] = VADR [TB+HB-1: TB];
PADR [TB+VB-1: TB] = VADR [TB+HB+VB-1: TB+HB];
PADR [TB -1:0] = VADR [TB-1: 0].
In these equations, TB, HB, and VB are calculated by the following equations:
TB=log2 (Blk_h),
HB=log2 (stride)-TB,
VB=log (Blk_v);
Stride, Blk_h, and Blk_v should be power of two.
With this address translation scheme, codec performance improved a maximum
of 47% in the bipredictive prediction picture (called the B-picture), and power con-
sumption in the video codec core was reduced by 16% [ 76 ]. This scheme is also
well suited for image rotation and block-based filter processing.
3.4.3
Processor Elements
To provide flexibility for handling multiple video coding standards, a stream processor
and six media processors are implemented in the video processing unit. Two fine
Search WWH ::




Custom Search