Hardware Reference
In-Depth Information
and low power consumption with multiple video formats. In full HD video processing,
dynamic current is still a dominant form of power consumption in low-power CMOS
technology. Therefore, the focus was on achieving lower dynamic power in the video
codec design using video signal processing characteristics.
Subsection 3.4.2 describes an overview of the video codec architecture. A two-
domain (stream-rate and pixel-rate) processing approach raises the performance of
both stream and image processing units for a given operating frequency. In the
image-processing unit, a sophisticated dual macroblock-level pipeline processing
with a shift-register-based ring bus is introduced. This circuit is simple yet provides
high throughput and a reasonable latency for video coding. Subsection 3.4.3
describes the stream processor and media processor architecture. The media proces-
sor is applied to transformations, subpixel motion compensation, and an in-loop
deblocking filter. Including the single stream processor, a total of seven application-
specific processors are integrated on the proposed video codec. Subsection 3.4.4
discusses the results of implementing the VPU from the viewpoints of performance
and power consumption. Subsection 3.4.5 concludes with a brief summary.
3.4.2
Video Codec Architecture
3.4.2.1
Architecture Model
Figure 3.75 shows the basic architecture of the VPU based on a heterogeneous mul-
ticore approach, the concept of which is the same as the heterogeneous multicore
chip for embedded systems described in Chap. 2. To satisfy both the high-performance
and low-power requirements for advanced embedded systems with greater flexibility,
it is necessary to develop parallel processing on a video processing unit by taking
advantage of the data dependency in video coding process.
Several low-power special-purpose processor (SPP) cores, several high-performance
application-specific hard-wired circuits (HWC), shared memory, and a global data
transfer unit (DTU) are embedded on a VPU. There are two types of SPPs, a stream
processor and a media processor. Each processing core includes local memories
(LM) and a local DTU. These are embedded in the processing core to achieve paral-
lel execution of internal operation in the core and data transfer operations between
cores and memories. Each core processes the data on its LM, and the DTU simulta-
neously executes memory-to-memory data transfer between cores, shared memory,
or off-chip memory via a global DTU. The dynamic clock controller (DCC), which
is connected to each core, controls the clock supply of each core independently and
reduces the dynamic power consumption of the VPU. The shared memory is a
middle-sized on-chip memory which is used as a line buffer in vertical deblocking
processing or as a reference image buffer for motion estimation/compensation. Each
core is connected to the on-chip interconnect called the shift-register-based bus
(SBUS), which is suitable for block-level pipeline processing. Frequency and voltage
control (FVC) is applied to the top level of the video processing unit only.
Search WWH ::




Custom Search