Software Compilation Techniques for MPSoCs - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

(register allocation). Scheduling refers to the process of organizing the instructions

in a timed sequence. The schedule can be computed statically (for RISC, DSPs

and VLIWs) or dynamically at runtime (for Superscalars) whereas the mapping

of operations to instructions is always computed statically. The main purpose of

mapping and scheduling in uni-processor compilers had been always to improve

performance. Code size is also an important objective for embedded (specially

VLIW) processors. Only recently, power consumption became an issue. However,

the reduction in power consumption with back end techniques does not have a big

impact on the overall system power consumption.

In an MPSoC compiler similar operations have to be performed. Mapping, in

this context, refers to the process of assigning code blocks to PEs and logical

communication links to physical ones. In contrast to the uni-processor case,

mapping can be also dynamic. A code block could be mapped at runtime to different

PEs, depending on availability of resources. Scheduling for MPSoCs has a similar

meaning as for uni-processors, but instead of scheduling instructions, the compiler

has to schedule code blocks. The presence of different application classes, e.g. real

time, add complexity to the optimizations in the compiler. Particularly, there is

much more room for improving power consumption in an MPSoC; after all, power

consumption is one of the MPSoC drivers in the first place.

The result of scheduling and mapping is typically represented in form of a Gantt

Chart , similar to the ones presented in Fig. 10 . The PEs are represented in the

vertical axis and the time in the horizontal axis. Code blocks are located in the plane,

according to the mapping and the scheduling information. In Fig. 10 a functions f1

and f2 are mapped to processor RISC1, the functions f3 and sum are mapped to

processor RISC2 and function f4 to processor RISC3.

Given that code blocks have a higher time variability than instructions, schedul-

ing can be rarely performed statically. Pure static scheduling requires full knowledge

of the timing behavior and is only possible for very predictable architectures and

regular computations, like in the case of systolic arrays [ 48 ] . If it is not possible to

obtain a pure static schedule, some kind of synchronization is needed. Different

scheduling approaches require different synchronization schemes with different

associated performance overhead. In the example of the previous section, the timing

information of task T3 is not known precisely as shown in Fig. 10 b . Therefore the

exact starting time of function sum cannot be determined and a synchronization

primitive has to be inserted to ensure correctness of the result. In this example, a

simple barrier is enough in order to ensure that the execution of T3 in processors

RISC3, RISC4 and RISC5 has finished before executing function sum .

2.4.1

Scheduling Approaches

Which scheduling approach to utilize depends on the characteristics of the applica-

tion and the properties of the underlying MoC used to describe it. Apart from pure

static schedules, one can distinguish among the following scheduling approaches:

Search WWH ::

Custom Search

Home