Digital Signal Processing Reference
In-Depth Information
These developments pose a qualitatively novel challenge to the portability of
specifications, applications and ultimately on the software that is used to implement
them, as well as to the software engineering and implementation methodology in
general: while sequential software used to automatically execute faster on a faster
processor, an increase in performance of an application on a new platform that
provides more parallelism is predicated on the ability to effectively exploit that
parallelism, i.e. to parallelize the application and thus match it to the respective
computing substrate. Traditionally, applications described in the style of mostly
sequential algorithms have taken advantage of multiple execution units using
threads and processes, thereby explicitly structuring an application into a (usually
small) set of concurrently executing sequential activities that interact with each
other using shared memory or other means of communication (e.g. messages, pipes,
semaphores) often provided either by the operating system or some middleware.
However, this parallel programming approach has some significant drawbacks. First,
it poses considerable engineering challenges—large collections of communicating
threads are difficult to test since errors often arise due to the timing of activities in
ways that cannot be detected or reproduced easily, and the languages, environments,
and tools usually provide little or no support for managing the complexities of
highly parallel execution. Second, a thread-based approach scales poorly across
platforms with different degrees of parallelism if the number of execution units
is significantly different from the number of threads. Too few execution units
mean that several threads need to be dynamically scheduled onto each of them,
incurring scheduling overhead. If the number of processors exceeds the number of
threads, the additional processors remain unused. The consequence is that threaded
application either needs to be overengineered to using as many threads as possible,
with the attendant consequences for engineering cost and performance on less
parallel hardware, or they will underutilize highly parallel platforms. Either way,
the requirement to construct an application with a particular degree of parallelism
in mind is a severe obstacle to the portability of threaded software. In an effort
to implement sequential or threaded applications on platforms that provide higher
degrees of parallelism than the application itself, parallelizing compilers have been
used with some success. However, the effectiveness of automatic parallelization
depends highly on the application and the details of the algorithm description, and
it does not scale well for larger programs. For well behaving specifications and
corresponding software to scale to future parallel computing platforms as seamlessly
as possible, it is necessary to describe algorithms in a way that:
1. exposes as much parallelism of the application as practical,
2. provides simple and natural abstractions that help manage the high degree
of parallelism and permits principled composition of and interaction between
modules,
3. makes minimal assumptions about the physical architecture of the computing
machine it is implemented on,
4. is efficiently implementable on a wide range of computing substrates, including
traditional sequential processors, shared-memory multicores, manycore proces-
sor arrays, and programmable logic devices, as well as combinations thereof.
Search WWH ::




Custom Search