MPEG Reconfigurable Video Coding - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

These developments pose a qualitatively novel challenge to the portability of

specifications, applications and ultimately on the software that is used to implement

them, as well as to the software engineering and implementation methodology in

general: while sequential software used to automatically execute faster on a faster

processor, an increase in performance of an application on a new platform that

provides more parallelism is predicated on the ability to effectively exploit that

parallelism, i.e. to parallelize the application and thus match it to the respective

computing substrate. Traditionally, applications described in the style of mostly

sequential algorithms have taken advantage of multiple execution units using

threads and processes, thereby explicitly structuring an application into a (usually

small) set of concurrently executing sequential activities that interact with each

other using shared memory or other means of communication (e.g. messages, pipes,

semaphores) often provided either by the operating system or some middleware.

However, this parallel programming approach has some significant drawbacks. First,

it poses considerable engineering challenges—large collections of communicating

threads are difficult to test since errors often arise due to the timing of activities in

ways that cannot be detected or reproduced easily, and the languages, environments,

and tools usually provide little or no support for managing the complexities of

highly parallel execution. Second, a thread-based approach scales poorly across

platforms with different degrees of parallelism if the number of execution units

is significantly different from the number of threads. Too few execution units

mean that several threads need to be dynamically scheduled onto each of them,

incurring scheduling overhead. If the number of processors exceeds the number of

threads, the additional processors remain unused. The consequence is that threaded

application either needs to be overengineered to using as many threads as possible,

with the attendant consequences for engineering cost and performance on less

parallel hardware, or they will underutilize highly parallel platforms. Either way,

the requirement to construct an application with a particular degree of parallelism

in mind is a severe obstacle to the portability of threaded software. In an effort

to implement sequential or threaded applications on platforms that provide higher

degrees of parallelism than the application itself, parallelizing compilers have been

used with some success. However, the effectiveness of automatic parallelization

depends highly on the application and the details of the algorithm description, and

it does not scale well for larger programs. For well behaving specifications and

corresponding software to scale to future parallel computing platforms as seamlessly

as possible, it is necessary to describe algorithms in a way that:

1. exposes as much parallelism of the application as practical,

2. provides simple and natural abstractions that help manage the high degree

of parallelism and permits principled composition of and interaction between

modules,

3. makes minimal assumptions about the physical architecture of the computing

machine it is implemented on,

4. is efficiently implementable on a wide range of computing substrates, including

traditional sequential processors, shared-memory multicores, manycore proces-

sor arrays, and programmable logic devices, as well as combinations thereof.

Signal Processing Systems

Search WWH ::

Custom Search

Home