Processor Cores - Heterogeneous Multicore Processor Technologies for Embedded Systems

Hardware Reference

In-Depth Information

3.4.2.5

Memory Management

The DMA read and DMA write depicted in Fig. 3.84 could cause a delay in the

macroblock-level pipeline processing. To prevent this, we have to improve the efficiency

of the image-data transfer especially in the reference-image read. To achieve an

efficient 2D data transfer, an address transformation scheme is introduced in the mem-

ory management unit for VPU and other media IPs in order to avoid a page miss in the

external SDRAM.

Most video codec standards require small, submacroblock level 2D data transfer

for reference reads in the decoding mode. Without using any particular techniques

to achieve such transfers, this will result in a page miss at every line. The penalty for

a page miss, which is around ten cycles or more, requires a high proportion of

memory bandwidth. Efficiency in the 2D data transfer is thus critically important.

With embedded systems such as mobile applications, in which various kinds of

software are executed, it is not feasible to adopt a particular form of memory alloca-

tion such as using a bank-interleave operation for each pixel line.

To avoid page misses, tile-linear address translation (TLAT) [ 76 ] is introduced

between the video codec and the on-chip interconnect. Figure 3.85a shows the

TLAT circuits and memory allocation in the virtual address (VADR) and physical

address (PADR) space. The lower-order bits of the VADR issued by the video codec

are rearranged into the corresponding PADR. As shown in Fig. 3.85b , 32 × 32 tile

access from the video codec is mapped to linear addressing in the PADR space.

When the lower address of the VADR is defined as VADR [m: 0], the PADR is

described as follows:

PADR [ m : TB+VB+HB] = VADR [ m : TB+VB+HB];

PADR [TB+VB+HB-1: TB+VB] = VADR [TB+HB-1: TB];

PADR [TB+VB-1: TB] = VADR [TB+HB+VB-1: TB+HB];

PADR [TB -1:0] = VADR [TB-1: 0].

In these equations, TB, HB, and VB are calculated by the following equations:

TB=log2 (Blk_h),

HB=log2 (stride)-TB,

VB=log (Blk_v);

Stride, Blk_h, and Blk_v should be power of two.

With this address translation scheme, codec performance improved a maximum

of 47% in the bipredictive prediction picture (called the B-picture), and power con-

sumption in the video codec core was reduced by 16% [ 76 ]. This scheme is also

well suited for image rotation and block-based filter processing.

3.4.3

Processor Elements

To provide flexibility for handling multiple video coding standards, a stream processor

and six media processors are implemented in the video processing unit. Two fine

Search WWH ::

Custom Search

Home