Hardware Reference
In-Depth Information
VPU5
Codec Element 2
Codec Element 1
PIPE
Transform
Prediction
PIPE
Motion
Comp.
PIPE
De-block
Filter
VLCS
Codec
DMAC
Shift-register-based bus
PIPE
micro-program
Load
Module
Store
Module
2D
ALU
Data I/O
Fig. 4.25
Structure of programmable video processing core VPU5
length coding for stream-rate domain (VLCS) codec. They are connected by a
shift-register-based bus for fast and efficient transfer of processing data. Each
codec element consists of a DMAC and three programmable image processing
elements (PIPEs) for transform prediction, motion compensation, and a deblock
filter. Each PIPE consists of a load module, a two-dimensional ALU, and a store
module. They are controlled by a microprogram, and the load/store modules use a
data I/O to connect to the bus. The VPU5 can handle various formats such as
MPEG-1/2/4, H.263, and H.264 and various resolutions from QCIF to full HD. The
programmability is a convenient feature that allows a new algorithm to be applied
or a previous algorithm to be updated. The details of the VPU5 are described in
Sect. 3.4 .
4.4.5
Global Clock Tree Optimization
Because the RP-X integrated various modules, it was important to reduce the power
consumption of unused modules by clock gating. The power consumption of clock
buffers was particularly large. Figure 4.26 shows the clock buffer deactivation cir-
cuits. In the conventional clock tree (i), global clock trees from a clock generator
were divided logically into CLK0, CLK1, and CLK2, and the clock of Modules A,
B, and C was provided by the same clock tree CLK0. However, the Module C was
located further away from the Modules A and B, and the clock tree of the Module C
became a dedicated tree from a point near the clock generator, which had to be acti-
vated even when the Module C was not used. On the contrary, the Modules A and B
successfully shared the clock tree and saved the clock tree's capacitance.
After optimizing the power (ii), the clock tree of the Module C was separated and
gated at the clock generator as CLK0_1, whereas the Modules A and B shared
the clock tree CLK0_0. In this way, the clock tree CLK0_1 can be stopped when
Search WWH ::




Custom Search