A Cache-Aware Strategy for H.264 Decoding on Multi-processor Architectures - VLSI Design and Test

Information Technology Reference

In-Depth Information

Fig. 1. A 3x3 H.264 frame

information such as intra prediction mode information and coded residual data.

Within a macro-block, luma samples may be coded as one of the three types

of block sizes, namely 4x4, 8x8 or 16x16 pixels. Chroma samples are commonly

coded as blocks of 8x8 pixels.

Reconstruction is an important step in the decoding an H.264 video frame.

Reconstruction of a decoded macro-block involves obtaining the data from neigh-

bouring macro-blocks based on which motion prediction had been made by the

encoder. This cannot be done independently, but only after fetching data of

neighbouring macro-blocks. In an intra-coded video frame, all dependencies are

in the same frame of video. In addition, a MB includes a variable amount of

residual information that cannot be inferred from previous MBs.

Attempts to parallelize the reconstruction step have been done at frame-level,

slice-level and macroblock-level. At the frame level, different frames are decoded

by different cores. However, this leads to too much pressure on the memory

system. Since there are no dependencies among macro-blocks across slices, slice-

level parallelism places much lower demands on the memory system. However,

since the number and dimensions of the slices are variable, it leads to poor

load-balancing. Thus, macro-block level parallelism is the most commonly used

technique to implement parallel H.264 decoders. [3] presents an excellent survey

of approaches to H.264 decoder parallelization proposed in literature.

The 2D wavefront approach [3] for parallel H.264 decoding exploits macro-

block level parallelism and computes a static schedule and a processor allocation

strategy. This has been quite successful in practice and proved to be an ecient

solution in a multi-processor setting. Figure 2 shows a snapshot of this method

Search WWH ::

Custom Search

Home