Multicore Systems on Chip - Signal Processing Systems - page 522

Digital Signal Processing Reference

In-Depth Information

74.9

80

70

60

50

40

44.8

37.1

30

17.3

13.3

20

10

0

6.0

2.2

Architectures

Fig. 9 Speedup provided in 18-tap FIR filter execution for superscalar, MPSoC and a mix of both

approaches

Aiming to illustrate the impact on performance of TLP and ILP exploration

on DSP applications, we evaluated the 18-tap FIR execution over three different

architectures: a four-issue Superscalar (SS); 6- 18- and 54-core MPSoCs based on

pipelined cores, with no ILP exploration capabilities (MP IOC ). Finally, in order to

have a glimpse on the future, we imagined a 6- 18- and 54-Cores MPSoCs based

on a four-issue superscalar processor, able to explore both ILP and TLP (MP SS ).

We have extracted the speedup with a tool [ 16 ] that makes all data dependence

graphs of the application. After, considering the characteristics of the evaluated

architectures, the execution time of each graph is measured in order to obtain their

speedup over the baseline processor. It is important to point out that instruction and

thread communication overhead has not been taken into account in this experiment.

The results shown in Fig. 9 reflect the speedup provided over a single pipelined

core performance running the C-like description of the 18-tap FIR filter presented

in Fig. 8 . The leftmost bar shows the speedup provided for the ILP exploration of a

four-issue superscalar processor. In this case, the execution time of the Superscalar

processor is 2.2 times lower than that of a pipelined core, showing that the FIR

filter has neither high nor low ILP, since a four-issue superscalar processor could

potentially achieve up to 4 times the performance of a pipelined core.

Considering the MPSoC composed of pipelined cores, the 6-core machine

provides almost a linear speedup, decreasing by 5.96 times the single pipelined core

execution time. This behavior is maintained when more pipelined cores are inserted.

However, when 18-tap FIR filter is explored for the maximum TLP (54-MP IOC ), a

speed up of only 44.8 times is achieved, showing that even applications which are

potentially suitable for TLP exploration could present non-linear speedups. This can

be explained by the sequential code present inside of each loop iteration.

Next Page

Signal Processing Systems

Search WWH ::

Custom Search

Home