WS-PGRADE/gUSE in European Projects - Science Gateways for Distributed Computing Infrastructures

Information Technology Reference

In-Depth Information

launched using a PRACE workflow, while short tests might run on workflows tuned

for EGI resources.

We have noticed that queues can be relatively long in some clusters, with strict

prioritization policies. Therefore iterating over the earthquakes in a single run could

guarantee that once the computation has commenced it will eventually complete for

all of the earthquakes. On the other hand, we have also included an option which

allows the submission of one workflow for each earthquake. This possibility will

de

nitely guarantee speedups in clusters with faster queues for smaller jobs. The

status of each run can be monitored from the Control tab, which offers a number of

sensible functionalities such as:

1. Download the output and error logs of the jobs in the workflow.

2. Reload the setup of the experiment.

3. Abort the run and delete the instance from the user

'

s archive of runs.

17.3.3.2 Multilayered Work

fl

ow Speci

cations and Provenance

The WS-PGRADE workflows, which have been implemented for the forward-

modeling application, consist of two main jobs. One job performs the actual

computation, the other takes care of controlling the staging out of the result data

from the DCI to a data management system and cleaning up the computational

cluster. The last task is extremely important since disk resources within the DCIs

are limited and the gateway tries to automate their sustainable exploitation.

As shown in Fig. 17.4 , Job0 of the workflow takes as input a number of

files:

earthquake parameters, station information, simulator con

guration, and library of

processing elements (PEs). The library of PEs contains all of the user scripts and

utilities that will be used by the executable of Job0. These PEs have been developed

by the scientists themselves in Python, and they can be scripted and chained into

pipelines. They consist of streaming operators that operate on units of data in a

stream and avoid moving streams in and out of the cluster. Moreover, the PEs are

capable of extracting all of the metadata information and provenance traces related

to their execution (Spinuso 2013). For instance, the MPI applications are also

launched from within a PE, in order to capture relevant information. This multi-

layered strategy to the workflow speci

cation allows the extraction and storage of

fine-grained lineage data at runtime. The storage system is based on a document

store exposed via a provenance web API (Davidson 2008). The coverage of the

current provenance model can be considered to be conceptually compliant with the

W3C-PROV 6

recommendation.

6

http://www.w3.org/TR/2013/REC-prov-dm-20130430 .

Science Gateways for Distributed Computing Infrastructures

Search WWH ::

Custom Search

Home