Hardware Reference
In-Depth Information
frequency. However, this increases the power consumption of the stream processing
unit in the stream-rate domain. To meet the performance requirements without
increasing the operating frequency, we introduce an intermediate stream buffer.
By using the intermediate stream buffering depicted in Fig. 3.78b , the outputs of S0
and S1 are stored in the intermediate stream buffer. As this time chart shows, S1 and
S2 can start processing independently of the time slot, and S2 is finished before the
end of P1. Therefore, the start of P2 is not delayed from the defined time slot, and
we can say that by using the intermediate stream buffering, each picture can start its
pixel decoding at every time slot. Thus, the two-domain structure with the interme-
diate stream buffer can handle all pictures at the average frequency, and this helps
to keep the required operating frequency, and hence the power consumption, low.
The intermediate stream format has two segments, one in fixed-length and the
other in variable-length coding, and the two parts are processed per symbol (not
per bit). The fixed-length part consists of information on the macroblocks, including
the slice boundaries, coded block pattern, quantization scale parameter, and several
other items. The variable-length part of the intermediate stream contains the other
syntax elements (motion vectors and transform coefficients) in exponential-Golomb
coding, which is a common, simple, and highly structured technique.
We evaluated the memory bandwidth between the stream and pixel domains.
Although access to the intermediate stream by the stream processing unit and image
processing units takes the form of access to the external synchronous DRAM
(SDRAM), the required memory bandwidth is less than would be required for the
conventional method (directly applying 16-bit-per-pixel transform coefficients).
Figure 3.79 plots the compression ratio of the intermediate stream relative to the
original stream for individual pictures of the H.264 conformance-test streams [ 75 ] ,
other than those for I_PCM. The compression ratios are around 1.6 and 1.5 for CABAC
and CAVLC, respectively. Although a portion of the intermediate stream is in fixed-
length coding, the coding efficiency was within 1.6 in the case of CABAC. The com-
pression effect of the intermediate stream relative to the conventional method
corresponds to a 95% reduction in required memory capacity and memory bandwidth
for the processing of a 40-Mbps full HD stream (64 Mbps for the intermediate stream
and 90 Mpixels/s for the transform coefficients). Table 3.19 lists the bandwidths of all
DMA channels in the video decoding process. The ratio of bandwidth for the interme-
diate stream buffer is only 4.8% and is small even in the worst case. Therefore, the use
of a stream buffer has only a small impact on power consumption.
3.4.2.3
Shift-Register-Based Bus Network and Macroblock-Level Pipeline
Processing
As shown in Fig. 3.76 , all submodules of the video codec are connected in a ring
structure by a bidirectional 64-bit shift-register-based bus (SBUS). Figure 3.80
shows the architecture of the SBUS and the data flow in the macroblock-pipeline
stages. The clockwise SBUS is the path for data readout from the external SDRAM.
The counterclockwise SBUS is used for intermodule data transfer to the next stage
Search WWH ::




Custom Search