Information Technology Reference
In-Depth Information
Fig. 2. 2D wavefront in action
in action on a frame with 99 MBs, in a 4 processor setup. A, B, C, and D are
the 4 processors. Each MB is labelled with the processor to which it is assigned.
Also, the number associated with each MB denotes the cycle at which the MB
will be processed. For example, the topmost leftmost MB (labelled as 1A) will
be processed in the first cycle on processor A. The next MB to its right, being
dependent on it (1A), cannot start till it finishes, and hence, is assigned to time
unit 2 in the same processor. The entire frame is processed in 38 time units on
4 processors, in the schedule as mentioned as labels on the MBs in Figure 2.
In order to improve scalability, this has been further extended to 3-dimensional
approach (3D wavefront), where two or more frames are decoded simultaneously
depending on the number of idle cores in the multiprocessor system [2].
3 Motivation and Objectives
Our work has several important considerations that makes it different from those
proposed in literature. Static approaches to parallelize decoding [3], in general,
assume, a regular dependency structure for a MB and equal processing times,
i.e. each MB is dependent on all its four neighbours [1] (top left, top, top right,
and side left), depending on which of these are actually present according to its
position (the top row MBs excepting the leftmost one, for example, only have
left dependency edges). However, in reality, there is a lot of input-dependent
variation, and in practice, the dependencies vary across MBs. In effect, a MB
can actually turn out to depend on one / two / three / all /none of its neighbours,
a fact that can lead to improvement in decode performance in a parallel setting.
This motivates a dynamic run-time schedule strategy.
Secondly, static methods often schedule MBs at uniform intervals on all cores,
assuming all MBs have equal processing times. This is not true in H.264. This
forces some of the cores to remain idle. For example, in Figure 2, if processor
A finishes processing MB 15A early, it has to wait for other cores. We assume
 
Search WWH ::




Custom Search