A Flexible Strategy for Distributed and Parallel Execution of a Monolithic Large-Scale Sequential Application - High Performance Computing

Information Technology Reference

In-Depth Information

to explore new methods, techniques or just to have a simple proof of concept.

In mining, projects require uncertainty quantification for risk analysis, which is

done through the construction of multiple simulated scenarios. These scenarios

often are represented by a numerical model that discretizes the volume of the

ore deposit into small cells, each one requiring a prediction of its properties, such

as grades or geological attributes. The construction of these models is costly in

computing time, and currently done using legacy code.

Many attempts to parallelize these algorithms have been made, but most of

them have been aimed at specific codes rather than providing a global solution

that can be implemented to all algorithms. The focus has been put into optimiz-

ing interpolation methods [4,6,24], sequential simulation code [19,18,27], multiple

point geostatistical methods [14,15,20,21,22,25]. In many of these cases, the use

of a GPU based approach has been central, however, we aim at providing a more

general solution to use many computers with different hardware characteristics.

Our goal is to solve these problems by enabling the use of multiple computers

connected in a local network, making them to work seamlessly. This would allow

scientists run many types of legacy code for large-scale applications and to have

a simple scheduler for easy and ecient execution of tasks.

In a recent work of Bergen et al. [5], they successfully transformed existing

monolithic C applications into a distributed semi-automatic system, making

that legacy code relevant again, through the use of remote procedure calls by

an approach similar to map-reduce — doing small modifications to their legacy

code. Later, Lunacek et al. [17] have proved through a scaling study that Python

is an excellent option to execute many-tasks on a compute cluster. There are

others solutions based on software as a service [3] or cloud technologies [1,2],

but we pursuit a different goal: develop an in-house tool, not running any extra

configuration on our machines nor installing an enterprise solution. Based on

these experiences, our contribution is to give a simple strategy for distributed

and parallel execution of tasks, using an existing heterogeneous computer local

network, in a clean an ecient way. We choose Python because it is easy to learn,

provides a wide range of scientific tools, it is supported by a strong community

and it is multi-platform.

This document is organized as follows: a description of the proposed strategy

and implementation topics are presented in sections 2 and 3 respectively. In sec-

tion 4, a case study using a sequential indicator simulation ( sisim in GSLIB[7])

is developed and section 5 shows some discussion on the results and some ideas

for future work.

2 Strategy Design

2.1 System Requirements

For the reasons explained above, we have defined the following requirements for

our distributed and parallel execution strategy:

Search WWH ::

Custom Search

Home