Hardware Reference
In-Depth Information
Video Processing Unit
DCC
DCC
DCC
DCC
SPPa #0
SPPa #m
SPPb #0
SPP b #n
LM
LM
LM
LM
DTU
DTU
DTU
DTU
Shift-register-based bus (SBUS)
HWC #0
HWC #k
Shared
Memory
Global DTU
LM
LM
DTU
DTU
CPU
#0
On-chip
Interconnect
FVC
FVC
DCC
DCC
Off-chip memory
Fig. 3.75
Architecture model of video processing unit (VPU)
When a program is executed on the heterogeneous multicore video processing
unit, it is divided into two structures, a frame of a picture and a macroblock. The
macroblock is a video compression component whose size is fixed at 16 × 16 pixels
in modern video coding standards. Each macroblock contains four luminance blocks
(Y), one blue color difference (Cb) block, and one red color difference (Cr) block in
a 4:2:0 YCbCr format. Macroblocks can be subdivided further into smaller blocks
called partitions. H.264, for example, supports block sizes as small as 4 × 4.
Each video component is executed in the most suitable processor core in parallel
as shown in Fig. 3.75 . Each core processes the data on its LM, and the DTU simul-
taneously executes memory-memory transfer. In the parallel operation, there are
time slots when the corresponding cores do not need to process or transfer data.
During these time slots, the corresponding cores are controlling the connected DCC
and making it stop the clocks automatically. This control reduces the redundant
power consumption of a core, resulting in lower power consumption of a heteroge-
neous multicore chip.
3.4.2.2
Stream Domain and Image Domain Processing
Figure 3.76 is a block diagram of the video processing unit, which is a heteroge-
neous multicore processing unit that applies our architecture model shown in
Fig. 3.75 .
The architecture consists of a stream-rate domain and a pixel-rate domain [ 74 ] .
These units operate independently in a picture-level pipeline manner to achieve full
HD performance while lowering the operating frequency. At a given time, this video
 
Search WWH ::




Custom Search