Hardware Reference
In-Depth Information
PIPE
PIPE
Sub-MB (4x4) level pipeline processing
bit extension,
transposition
bit rounding,
transposition
2-D processing
PC
PC
Shared instruction memory
Shared instruction memory
Loading PU
Loading PU
Media PU
Media PU
Storing PU
Storing PU
ALU
ALU
ALU
ALU
Media
ALU
Media
ALU
LD/
ST
LD/
ST
LD/
ST
LD/
ST
Local data memory
Local data memory
DMAC
DMAC
Shift-register-based bus
Shift-register-based bus
Fig. 3.88
Architecture of programmable image processing element (PIPE)
To reduce the number of flip-flops, the symbols are categorized into three types by
bit width: type 1 (1-4 bits), type 2 (4-16 bits), and type 3 (16-32 bits). As a result,
the number of entries belonging to type 1 is about 1.8 times larger than that belong-
ing to the other categories. Based on this result, our register file architecture consists
of three partitions: 64 type 1 entries (4 bits), 32 type 2 entries (16 bits), and 32 type
3 entries (32 bits) as shown in Fig. 3.87 . Compared with a 32-bit nonpartitioned
register file, a 57% reduction in the number of flip-flops is achieved.
The CABAC accelerator achieves a performance of two cycles per bit of the bin
string (an intermediate binary representation of the syntax elements), which corre-
sponds to three cycles per bit of the stream. This is assuming that the compression
rate for the arithmetic coding is 1.5 and that single-cycle-access flip-flops are used
to update the context information. Taking the several cycles of processing overhead
into account, the performance is 40 Mbps at 162-MHz operation.
3.4.3.2
Programmable Image Processing Element
To provide flexibility for handling multiple video standards, the following six
submodules of the image processing units are implemented as low-power media
processors [ 74 ]: two fine motion estimators/motion compensators (FME), two
transformers (TRF), and two in-loop deblocking filters (DEB). These modules are
shown in Fig. 3.76 .
Figure 3.88 is a block diagram of the programmable image processing element
(PIPE). The PIPE is a tightly coupled multiprocessing unit (PU) system which con-
sists of three PUs (the loading PU, media PU, and storage PU), a local data memory,
and a shared instruction memory. Each PIPE is capable of simultaneously loading
 
Search WWH ::




Custom Search