Information Technology Reference
In-Depth Information
20
P2
P3,P5
P4,P6
P8
P7
18
16
14
12
10
8
6
4
2
0
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192 16384
Number of Processors
Fig. 6. Theoretical speedup limits calculated from Amdahl's law
We can estimate α by using the measured speedup SU on a specific number of proces-
sors sn as follows:
SU 1
sn 1
α estimated =
.
(9)
The results show that for Program 2 the speedup obtained with 8 processors is almost
the limit for it, but the speedup for Program 7 can still grow up to 20 , which implies
reducing the execution time to less than 15 minutes.
Unfortunately, we have not been able to check how the results of Amdahl's law
approach to reality. We tried to execute the Program 7 in the Superdome Ben, but ex-
ecuting it using 32 cores the time consumed was much higher than using 2 cores in a
node of the cluster. It is owing to the computing speed (819 Gflops in the Superdome
and 9.72 Tflops in the cluster).
As an alternative, we tried using MPI (Message Passing Interface Standard) [31]
in order to execute our programs using different nodes of the cluster simultaneously.
However, we encountered the problem of an excessive memory requirement, due to the
need to replicate data across processes, and consequently we failed in the execution of
the programs by this way too.
7
Conclusions and Outlook
Thanks to OpenMP parallelization techniques running under a supercomputing shared
memory environment we succeded to evaluate the perfomance of a CCA application
at different stages of technology deployment. To conclude, we were able to solve a
program with a sequential execution time of 297 . 975 minutes in only 50 . 402 minutes.
Regarding the problems we have encountered, as future work, we aim to improve
our analytical model, trying to reduce as possible the computational and memory costs.
Search WWH ::




Custom Search