Digital Signal Processing Reference
In-Depth Information
Encoding Performance. Table 3 lists the overall frame encoding rate and speedup of
both the X264 code and our stream code. It shows that the stream code achieves a
significant speedup over the X264 code on all four platforms. It's quite excellent for
streaming, a program level optimization technique without any special hardware sup-
port, to obtain such degree of speedup on various different architectures. The experi-
mental results also show that our encoder is capable of real-time 1080p H.264 video
encoding on a completely programmable processor—STORM. Meanwhile the stream
code on GPU also gets better encode performance than other GPU implementations
[5][6][7].
Table 3. Average Performances on processors for 200 frames of 1080P HD video sequences -
BlueSky/Station2/Rush_hour, latency, compression rate and speedup
Comparing with other Parallelization Techniques In essence, streaming is a model-
based parallelization technique. However, there are many general parallelization
techniques for accelerating H.264 encoding, such as frame/slice level parallelism,
wavelet/MB pipeline and fine granularity block parallelism [8][10[11]. Compared with
streaming, these techniques only focus on data dependency of some level with the risk
of side-effects. For example, frame level parallelism will increase the requirement for
bandwidth, while slice level parallelism will reduce the compression rate. Differently,
this paper presents a completely synthesized technique based on the stream model. On
modern programmable processors, single level optimization is insufficient for good
performance, so multi level translation and parallelism is an inevitable trend. However,
the novelty of this paper is choosing the stream model as a foundation for other optimi-
zations, a key reason for getting such improved performance.
Further speedup of our software H.264 encoder may be achieved by less effort in
algorithm level optimization beyond program level optimization. For example, K-L or
wavelet transform can be used to replace the DCT transform in transform coding,
CABAC can also be used to replace CAVLC in entropy coding. However, this may
have a downside , because the improvement may be bound with some special algo-
rithm, thus limiting the applicable domain of the streaming technique. Furthermore,
the streaming techniques can be combined with other parallelization techniques. Our
measurement on X86 shows that slice level parallelism (2 slices) accelerates HD
H.264 encoding to 6.38fps for the BlueSky 1080P sequence, while the streaming ap-
proach accelerates it to 10.6fps, and the combination of these two techniques achieves
13.2fps.
Search WWH ::




Custom Search