Digital Signal Processing Reference
In-Depth Information
4-issue Superscalar
8-Cores MPSoc
18-Cores MPSoc
24
22
20
18
16
14
12
10
8
6
4
2
0
0.01
0.1
0.25
0.5
0.75
0.99
Parallelism Percentage ( a or d )
Fig. 7
Multiprocessing Systems and High-end single processor energy consumption;
α = δ
is
assumed
consumes less energy than the 18-Core for low/medium TLP values (
δ <
0.75).
However, when applications present greater thread level parallelism (
0.9), the
energy consumed by the 18-Core multiprocessor reaches the same values as the
8-Core design, thanks to the better usage of the available processors.
δ >
2.5
DSP Applications on a MPSoC Organization
We evaluate the superscalar and MPSoC performance considering an actual DSP
application execution. An 18-tap FIR filter is used to measure the performance
of both approaches handling a traditional DSP application. The C-like description
of the FIR filter employed in this experiment is illustrated in Fig. 8 . Superscalar
machines explore the FIR filter instruction level parallelism in a transparent way,
working on the original binary code. Unlike the superscalar approach, to explore
the potential of the MPSoC architecture there is a need to make manual source code
annotations in order to split the application code among many processing elements.
In this way, some code highlights are shown in Fig. 8 to simulate annotations,
indicating the necessary number of cores to explore the ideal thread level parallelism
of each part of the FIR filter code. For instance, the first annotation considers a loop
controlled for IMP SIZE value, which depends on the number of FIR taps. In this
case, 54 loop iterations are done since the experiment regards an 18-tap FIR filter.
The OpenMP [ 4 ] programming language provides specific code directives to
easily split loop iterations among processing elements. Using OpenMP directives,
the ideal exploration of this loop is done through 54-core MPSoC, each one
being responsible for running single loop iteration. However, when the amount of
processing elements is lower than the number of loop iterations, OpenMP combines
 
 
Search WWH ::




Custom Search