A Cache-Aware Strategy for H.264 Decoding on Multi-processor Architectures - VLSI Design and Test

Information Technology Reference

In-Depth Information

Fig. 2. 2D wavefront in action

in action on a frame with 99 MBs, in a 4 processor setup. A, B, C, and D are

the 4 processors. Each MB is labelled with the processor to which it is assigned.

Also, the number associated with each MB denotes the cycle at which the MB

will be processed. For example, the topmost leftmost MB (labelled as 1A) will

be processed in the first cycle on processor A. The next MB to its right, being

dependent on it (1A), cannot start till it finishes, and hence, is assigned to time

unit 2 in the same processor. The entire frame is processed in 38 time units on

4 processors, in the schedule as mentioned as labels on the MBs in Figure 2.

In order to improve scalability, this has been further extended to 3-dimensional

approach (3D wavefront), where two or more frames are decoded simultaneously

depending on the number of idle cores in the multiprocessor system [2].

3 Motivation and Objectives

Our work has several important considerations that makes it different from those

proposed in literature. Static approaches to parallelize decoding [3], in general,

assume, a regular dependency structure for a MB and equal processing times,

i.e. each MB is dependent on all its four neighbours [1] (top left, top, top right,

and side left), depending on which of these are actually present according to its

position (the top row MBs excepting the leftmost one, for example, only have

left dependency edges). However, in reality, there is a lot of input-dependent

variation, and in practice, the dependencies vary across MBs. In effect, a MB

can actually turn out to depend on one / two / three / all /none of its neighbours,

a fact that can lead to improvement in decode performance in a parallel setting.

This motivates a dynamic run-time schedule strategy.

Secondly, static methods often schedule MBs at uniform intervals on all cores,

assuming all MBs have equal processing times. This is not true in H.264. This

forces some of the cores to remain idle. For example, in Figure 2, if processor

A finishes processing MB 15A early, it has to wait for other cores. We assume

Search WWH ::

Custom Search

Home