Digital Signal Processing Reference
In-Depth Information
74.9
80
70
60
50
40
44.8
37.1
30
17.3
13.3
20
10
0
6.0
2.2
Architectures
Fig. 9 Speedup provided in 18-tap FIR filter execution for superscalar, MPSoC and a mix of both
approaches
Aiming to illustrate the impact on performance of TLP and ILP exploration
on DSP applications, we evaluated the 18-tap FIR execution over three different
architectures: a four-issue Superscalar (SS); 6- 18- and 54-core MPSoCs based on
pipelined cores, with no ILP exploration capabilities (MP IOC ). Finally, in order to
have a glimpse on the future, we imagined a 6- 18- and 54-Cores MPSoCs based
on a four-issue superscalar processor, able to explore both ILP and TLP (MP SS ).
We have extracted the speedup with a tool [ 16 ] that makes all data dependence
graphs of the application. After, considering the characteristics of the evaluated
architectures, the execution time of each graph is measured in order to obtain their
speedup over the baseline processor. It is important to point out that instruction and
thread communication overhead has not been taken into account in this experiment.
The results shown in Fig. 9 reflect the speedup provided over a single pipelined
core performance running the C-like description of the 18-tap FIR filter presented
in Fig. 8 . The leftmost bar shows the speedup provided for the ILP exploration of a
four-issue superscalar processor. In this case, the execution time of the Superscalar
processor is 2.2 times lower than that of a pipelined core, showing that the FIR
filter has neither high nor low ILP, since a four-issue superscalar processor could
potentially achieve up to 4 times the performance of a pipelined core.
Considering the MPSoC composed of pipelined cores, the 6-core machine
provides almost a linear speedup, decreasing by 5.96 times the single pipelined core
execution time. This behavior is maintained when more pipelined cores are inserted.
However, when 18-tap FIR filter is explored for the maximum TLP (54-MP IOC ), a
speed up of only 44.8 times is achieved, showing that even applications which are
potentially suitable for TLP exploration could present non-linear speedups. This can
be explained by the sequential code present inside of each loop iteration.
 
 
Search WWH ::




Custom Search