Hardware Reference
In-Depth Information
Instruction
memory
Instruction
memory
Data
memory
Data
memory
Table
memory
Table
memory
Intermediate stream
to internal bus
2-way/3-stage pipeline
Exec. 0
Instruction
decode 0
LD/ST unit
ALU
LD/ST unit
ALU
32
32
32
Variable-length
coding unit
Inst-
ruction
fetch
Golomb
enc/dec
Table
lookup
Golomb
enc/dec
Table
lookup
Instruction
decode 1
Instruction
decode 1
Execution 1
Execution 1
32
32
ALU
ALU
32
Read port
(32bitx4)
Partitioned register file
Partitioned register file
4bitx64(Type 1)
4bitx64(Type 1)
Write port(32bitx2)
16bitx32(Type 2)
16bitx32(Type 2)
STX
32bitx32(Type 3)
32bitx32(Type 3)
Fig. 3.87
Stream processor architecture
Also, the Golomb encoding/decoding process does not change with each video
coding standard. Therefore, we developed the variable-length coding unit in the
STX as the dedicated variable-length coding hardware. On the contrary, syntax
analysis and context calculation have complicated data flows, and they vary with
each video coding standard. Thus, they are implemented into the firmware for each
standard. These processes also have a lot of branch operations. In general, VLIW
architecture is not good at handling branch operations, and branch-stall cycles
increase in proportion to the number of pipeline stages. Thus, the number of stages
in the STX is reduced to as few as possible.
The STX also has an out-of-order execution feature. If the instruction decoder in
the STX judges that there is no data dependency with a variable-length coding
instruction and the following instructions, then the pipeline executes the next
instruction, even though the execution of the variable-length coding instruction is
not finished. This feature enables the symbol-level processing of variable length
coding and syntax analysis/context calculation to be pipelined. This pipeline pro-
cessing is effective for improving the performance in processing stream data that
have large bit rates and include a lot of residual data.
When calculating the context for a symbol in a video stream, various previously
decoded symbols are required. For efficient access to these symbols, they are located
in the register file. Before designing the STX, we estimated the number of entries
required in the register file from specifications of several video coding standards.
Based on this estimation, it was determined that 128 entries were sufficient for
storing previously decoded symbols while encoding or decoding various video
streams. However, 128 entries × 32 bits (4,096 bits) of flip-flops require large hardware.
 
Search WWH ::




Custom Search