Information Technology Reference
In-Depth Information
-Polyglotmodel. Should be able to perform native executions of existing
code and binary executables. Here, the bottom line is to execute applications
written in Fortran, C, C++ and Python. This cannot be done using Remote
Object Call (ROC) or Remote Procedure Call (RPC), because the source
code is not always available.
- Data handling. Must use different scientific data types as multidimensional
arrays, hashed tables, time tables, among others. Continuous and categorical
variables are represented in different formats and ranges — integer, float or
strings.
- Input/output. There is the need to simplify the parametrization of the
processes and the input files required for each of them. The output from
each execution could be numerical data, (probably several) big binary files
and large images.
- Distributed computing. Computation must be easily and seamlessly dis-
tributed. Message Passing Interface (MPI) is discarded because, on the one
hand, it requires to make big changes to the legacy code and, on the other
hand, our workstation cluster is extremely heterogeneous.
- Hardware availability. The available resources on each workstation (com-
putation node) must be transparently visible and highly configurable in order
to allow managing the limited resources, taking advantage of as much com-
puting power as possible (multicore CPUs, multiple GPUs or SSD units)
without interfering with user common tasks — optimizing idle computing
capacity.
- System heterogeneity. Should be able to deploy in a multi-platform sys-
tem.
- Multi-master topology. Every computation node should be set up as a
master, configuring in this way a decentralized system.
2.2 Architecture Overview
The architecture is composed of two main components: a) a distributed task
scheduler and b) a shared data storage. The task scheduler is necessary to assign
workload to different workstation nodes. The shared data storage is required
to save and retrieve input and output files, session variables and corresponding
metadata.
Distributed Task Scheduler. A Task represents an abstract definition of the
target (actual legacy) code that is needed to be executed. Every node in the
cluster must run some component in background in order to accept and gener-
ate remote (task) execution calls. That component is called the Distributed Task
Scheduler . In order to provide a strategy as general as possible, the scheduler
modules were grouped in three categories: front-end, broker and back-end. De-
tails are shown in Figure 1.
 
Search WWH ::




Custom Search