Processor Cores - Heterogeneous Multicore Processor Technologies for Embedded Systems

Hardware Reference

In-Depth Information

frequency. However, this increases the power consumption of the stream processing

unit in the stream-rate domain. To meet the performance requirements without

increasing the operating frequency, we introduce an intermediate stream buffer.

By using the intermediate stream buffering depicted in Fig. 3.78b , the outputs of S0

and S1 are stored in the intermediate stream buffer. As this time chart shows, S1 and

S2 can start processing independently of the time slot, and S2 is finished before the

end of P1. Therefore, the start of P2 is not delayed from the defined time slot, and

we can say that by using the intermediate stream buffering, each picture can start its

pixel decoding at every time slot. Thus, the two-domain structure with the interme-

diate stream buffer can handle all pictures at the average frequency, and this helps

to keep the required operating frequency, and hence the power consumption, low.

The intermediate stream format has two segments, one in fixed-length and the

other in variable-length coding, and the two parts are processed per symbol (not

per bit). The fixed-length part consists of information on the macroblocks, including

the slice boundaries, coded block pattern, quantization scale parameter, and several

other items. The variable-length part of the intermediate stream contains the other

syntax elements (motion vectors and transform coefficients) in exponential-Golomb

coding, which is a common, simple, and highly structured technique.

We evaluated the memory bandwidth between the stream and pixel domains.

Although access to the intermediate stream by the stream processing unit and image

processing units takes the form of access to the external synchronous DRAM

(SDRAM), the required memory bandwidth is less than would be required for the

conventional method (directly applying 16-bit-per-pixel transform coefficients).

Figure 3.79 plots the compression ratio of the intermediate stream relative to the

original stream for individual pictures of the H.264 conformance-test streams [ 75 ] ,

other than those for I_PCM. The compression ratios are around 1.6 and 1.5 for CABAC

and CAVLC, respectively. Although a portion of the intermediate stream is in fixed-

length coding, the coding efficiency was within 1.6 in the case of CABAC. The com-

pression effect of the intermediate stream relative to the conventional method

corresponds to a 95% reduction in required memory capacity and memory bandwidth

for the processing of a 40-Mbps full HD stream (64 Mbps for the intermediate stream

and 90 Mpixels/s for the transform coefficients). Table 3.19 lists the bandwidths of all

DMA channels in the video decoding process. The ratio of bandwidth for the interme-

diate stream buffer is only 4.8% and is small even in the worst case. Therefore, the use

of a stream buffer has only a small impact on power consumption.

3.4.2.3

Shift-Register-Based Bus Network and Macroblock-Level Pipeline

Processing

As shown in Fig. 3.76 , all submodules of the video codec are connected in a ring

structure by a bidirectional 64-bit shift-register-based bus (SBUS). Figure 3.80

shows the architecture of the SBUS and the data flow in the macroblock-pipeline

stages. The clockwise SBUS is the path for data readout from the external SDRAM.

The counterclockwise SBUS is used for intermodule data transfer to the next stage

Heterogeneous Multicore Processor Technologies for Embedded Systems

Search WWH ::

Custom Search

Home