Information Technology Reference
In-Depth Information
A Cache-Aware Strategy for H.264 Decoding
on Multi-processor Architectures
Arani Bhattacharya 1 , Ansuman Banerjee 1 , Susmita Sur-Kolay 1 ,
Prasenjit Basu, and Bhaskar J. Karmakar 2 ,∗
1 Indian Statistical Institute
{ arani89,prasenjit.basu } @gmail.com,
{ ansuman,ssk } @isical.ac.in
2 S3Craft Technologies
bhaskar@s3craft.com
Abstract. H.264-AVC is one of the most popular formats for the record-
ing, compression and distribution of video. Encoders and decoders for
the H.264 standard are widely in demand, and ecient strategies for
enhancing their performance have been areas of active research. With
the proliferation of many-core architectures in the embedded commu-
nity, there has been a trend towards parallelizing implementations of
encoders and decoders. In this paper, we present a run time heuristic
which exploits macro-block level parallelism and ecient scheduling in-
side a H.264 decoder to reduce the number of cache misses and improve
the processor utilization. Experiments on standard benchmarks show a
significant speed-up over contemporary strategies proposed in literature.
1 Introduction
H.264/MPEG-4 Part 10 or AVC (Advanced Video Coding) is one of the most
common video formats in recent times. H.264 provides much better compression
ratios than most other video formats such as H.263 and MPEG-2. Encoders and
decoders for the H.264 standard are widely in demand, and ecient strategies
for enhancing their performance have been areas of active research.
Security applications typically involve widespread deployment of H.264. In
the security context, videos are mostly intra-coded, i.e. all existent motion de-
pendencies are within the same frame. Intra-coded videos have therefore been a
subject of active research in both the academic and industrial setting.
With the proliferation of many core architectures in the embedded commu-
nity, there has been a trend towards parallelizing implementations of encoders
and decoders. In general, these proposals have focused on ecient exploitation
of inherent parallelism (at the frame level, slice level or macro-block level) in the
This work was started while Bhaskar J. Karmakar and Prasenjit Basu were at Texas
Instruments, India. The authors would like to acknowledge the financial assistance
received from Texas Instruments, and thank Dr. Mahesh Mehendale, Fellow, Texas
Instruments for his continuous encouragement and support.
 
Search WWH ::




Custom Search