Information Technology Reference
In-Depth Information
to explore new methods, techniques or just to have a simple proof of concept.
In mining, projects require uncertainty quantification for risk analysis, which is
done through the construction of multiple simulated scenarios. These scenarios
often are represented by a numerical model that discretizes the volume of the
ore deposit into small cells, each one requiring a prediction of its properties, such
as grades or geological attributes. The construction of these models is costly in
computing time, and currently done using legacy code.
Many attempts to parallelize these algorithms have been made, but most of
them have been aimed at specific codes rather than providing a global solution
that can be implemented to all algorithms. The focus has been put into optimiz-
ing interpolation methods [4,6,24], sequential simulation code [19,18,27], multiple
point geostatistical methods [14,15,20,21,22,25]. In many of these cases, the use
of a GPU based approach has been central, however, we aim at providing a more
general solution to use many computers with different hardware characteristics.
Our goal is to solve these problems by enabling the use of multiple computers
connected in a local network, making them to work seamlessly. This would allow
scientists run many types of legacy code for large-scale applications and to have
a simple scheduler for easy and ecient execution of tasks.
In a recent work of Bergen et al. [5], they successfully transformed existing
monolithic C applications into a distributed semi-automatic system, making
that legacy code relevant again, through the use of remote procedure calls by
an approach similar to map-reduce — doing small modifications to their legacy
code. Later, Lunacek et al. [17] have proved through a scaling study that Python
is an excellent option to execute many-tasks on a compute cluster. There are
others solutions based on software as a service [3] or cloud technologies [1,2],
but we pursuit a different goal: develop an in-house tool, not running any extra
configuration on our machines nor installing an enterprise solution. Based on
these experiences, our contribution is to give a simple strategy for distributed
and parallel execution of tasks, using an existing heterogeneous computer local
network, in a clean an ecient way. We choose Python because it is easy to learn,
provides a wide range of scientific tools, it is supported by a strong community
and it is multi-platform.
This document is organized as follows: a description of the proposed strategy
and implementation topics are presented in sections 2 and 3 respectively. In sec-
tion 4, a case study using a sequential indicator simulation ( sisim in GSLIB[7])
is developed and section 5 shows some discussion on the results and some ideas
for future work.
2 Strategy Design
2.1 System Requirements
For the reasons explained above, we have defined the following requirements for
our distributed and parallel execution strategy:
 
Search WWH ::




Custom Search