Information Technology Reference
In-Depth Information
video structure with an aim of achieving better decode performance by load bal-
ancing and distribution of workload among available processors, while honouring
video dependency constraints as applicable.
For intra-coded videos, strategies exploiting macro-block level parallelism have
been found to be more successful in general. The problem in this setting es-
sentially is to identify the macro-block dependency structure inside a H.264
slice/frame, in order to process the macro-blocks in parallel (honouring depen-
dencies as applicable) on the available processors in a multi-core setting, with
an objective to minimize the end-to-end decode time. Both static and dynamic
macro-block scheduling strategies have been proposed. Static scheduling strate-
gies in general assume worst case dependency patterns among the constituent
macro-blocks and often, equal processing times, irrespective of their types.
This paper has two important considerations. A static scheduling approach,
which assumes uniform macro-block processing times, leads to poor processor
utilization. In reality, macro-block processing times vary depending on the in-
puts and the dependency structure. Hence, it is possible to improve the effective
processor utilization by adopting a dynamic scheduling approach that assigns
macro-blocks to free processors as soon as they are ready, as opposed to a static
solution that would normally schedule at pre-defined intervals. In addition to im-
proving utilization, we also show that the effective speed-up obtained crucially
depends on the cache interaction of the decode strategy in a multi-processor set-
ting with a hierarchical (private L1, shared L2, DRAM) memory structure. Many
of the existing decode strategies often do not consider the cache misses resulting
from cache oblivious selection of the macro-blocks to be processed, which in turn
leads to significant slowdown in decoder performance due to frequent accesses
to the lower and slower memory levels.
Our work has two proposals for harnessing the effective power of parallel
computation in a multi-core setting. On one hand, we propose a cache-aware [5]
scheduling strategy to minimize the number of cache misses, by carefully se-
lecting the macro-blocks to be considered next, keeping in view the chance of
a macro-block it depends on, getting evicted from the cache due to capacity
or conflict misses. On the other hand, we attempt to improve the number of
macro-blocks available for processing at every time point, which in turn implies
better processor utilization and hence, improvement in speedup.
We implemented our schedule heuristic and evaluated it on a number of stan-
dard benchmarks. Experiments have shown significant speed up as compared to
methods that currently exist.
2 Background and Related Work
A H.264 video [1, 6] consists of a sequence of frames. A frame is an array of
luma samples and two corresponding arrays of chroma samples. Each frame is
further divided into spatial units called slices. A slice consists of blocks of 16 x 16
pixels, known as macro-blocks (MB). A macro-block contains type information
describing the choice of methods used to code the macro-block and prediction
Search WWH ::




Custom Search