A Flexible Strategy for Distributed and Parallel Execution of a Monolithic Large-Scale Sequential Application - High Performance Computing

Information Technology Reference

In-Depth Information

to fully understand all the required steps to deploy a script using different under-

lying applications. More complex scripts can be extended using standard Python

programming techniques.

4.2 Standard SISIM Scalability Test

In the first test the simulation workload was distributed using our strategy,

without any code modifications to the existing sisim program. We implemented

a Python script that set the parameters and data files pointing to sisim exe-

cutable. Also a simple main application was developed. This main application

creates a Task that will configure the target code input parameters, uploads the

target code and corresponding input data to the shared storage system and fi-

nally send the Task for execution. The code was instrumented to report partial

and overall execution times.

This case study includes a comparison between the base execution of standard

sisim routine and the distributed version up to 32 cores using the homogeneous

cluster. Timing and speedup results can be viewed in Table 1. Base case denotes

an execution of sisim with N = 96 (number of simulations) and a domain ʩ of

2880000 points. Single node tests use the strategy to distribute simulations in

one machine as independent native system processes, namely, up to four parallel

workers running 24 simulations each one. Distributed tests use four workers

in each node, using the implemented round-robin scheduler to assign workload.

Under this configuration, sisim was parallelized up to 32 instances, each Worker

running 3 simulations.

Table 1. Standard algorithm parallelized and distributed

case

processes time[s] speedup eciency

base

1

9124.64

1.00

1

9240.11

0.99

99%

2

4748.68

1.91

95%

sisim

single node

3

3240.30

2.82

94%

4

2518.77

3.62

91%

4

2890.09

3.16

79%

8

1515.58

6.02

75%

sisim

distributed

12

1101.01

8.29

69%

16

909.53

10.03

63%

24

712.66

12.80

53%

32

627.36

14.54

45%

Table 1 shows that using the proposed strategy it is possible to increase the

speedup in single node, having an eciency of 91% when using all available re-

sources on a local workstation. Distributed tests results also demonstrate scala-

bility reducing overall computation time progressively. However, it shows that we

lose eciency when more nodes were added. This phenomenon is explained by

High Performance Computing

Search WWH ::

Custom Search

Home