Design Space Exploration for Run-Time Management of a Reconfigurable System for Video Streaming - Multi-objective Design Space Exploration of Multiprocessor SoC Architectures

Hardware Reference

In-Depth Information

Texture Coding (TC) performs discrete cosine transform and quantization over

motion compensated residuals ('texture').

Texture Update (TU) uses the output of TC in order to locally reconstruct the

original frame as it would appear after the decoding phase. This reconstructed

frame can be useful later on as reference frame.

Entropy Coding (EC) encodes the motion vectors to produce a compressed

bitstream.

Bitstream Packetizing (BP) prepares the packets containing the output data.

These functional blocks are implemented in application kernels (computational inten-

sive nested loops) which have been optimized for compilation on VLIW architectures,

in particular for the execution on an ADRES processor [ 5 ] used in our MPSoC

platform.

The RRM can generate a trade off between application performance and resource

usage by selecting a specific parallelization to be executed on the platform. To do

so, the RRM needs different parallel versions of the same application, i.e., different

binaries which perform the same functionalities while using a different amount of

computing elements.

A set of parallel versions has been generated starting from a sequential imple-

mentation based on the MPEG4 Simple Profile reference code. First of all, the initial

sequential version has been pruned and cleaned to set-up the parallelization proce-

dure. Then, the sequential application is parallelized using MPSoC Parallelization

Assist (MPA) tool [ 6 ].

MPA is a tool which supports MPSoC programmers on investigating different

parallelization alternatives for a given application. Once the MPSoC programmer

specifies a parallelization for the application, MPA is able to automatically insert

into the sequential code all program lines needed to spawn parallel threads and to

implement inter-thread communication. For generating different versions of the same

application, the programmer should profile the sequential application, understand

how kernels can be assigned to different threads and specify different parallelization

opportunities to MPA, without any further handmade modification to the application

code.

MPA is able to handle parallelizations either at functional or at data level (a com-

bination of both functional and data parallelism is also handled). In practice, different

functional kernels can be organized over different threads (functional parallelization)

or the same kernel(s) can be divided over different threads by dividing them w.r.t.

loop indices (data parallelization). In the second case, each thread performs the same

functionalities over a different part of the dataset.

Once different parallel versions are generated with MPA, the obtained codes

can be compiled and generated binaries can be executed on the target platform. In

particular, Fig. 9.2 shows the parallel versions of the MPEG4 encoder studied during

this Chapter. Within Fig. 9.2 , the functional blocks are reported in solid boxes while

the thread partitioning is represented by dotted lines.

In this case study, every thread needs a computing element to be executed and a

computing element cannot execute more than one thread. Thus, the number of threads

Multi-objective Design Space Exploration of Multiprocessor SoC Architectures

Search WWH ::

Custom Search

Home