Information Technology Reference
In-Depth Information
launched using a PRACE workflow, while short tests might run on workflows tuned
for EGI resources.
We have noticed that queues can be relatively long in some clusters, with strict
prioritization policies. Therefore iterating over the earthquakes in a single run could
guarantee that once the computation has commenced it will eventually complete for
all of the earthquakes. On the other hand, we have also included an option which
allows the submission of one workflow for each earthquake. This possibility will
de
nitely guarantee speedups in clusters with faster queues for smaller jobs. The
status of each run can be monitored from the Control tab, which offers a number of
sensible functionalities such as:
1. Download the output and error logs of the jobs in the workflow.
2. Reload the setup of the experiment.
3. Abort the run and delete the instance from the user
'
s archive of runs.
17.3.3.2 Multilayered Work
fl
ow Speci
cations and Provenance
The WS-PGRADE workflows, which have been implemented for the forward-
modeling application, consist of two main jobs. One job performs the actual
computation, the other takes care of controlling the staging out of the result data
from the DCI to a data management system and cleaning up the computational
cluster. The last task is extremely important since disk resources within the DCIs
are limited and the gateway tries to automate their sustainable exploitation.
As shown in Fig. 17.4 , Job0 of the workflow takes as input a number of
files:
earthquake parameters, station information, simulator con
guration, and library of
processing elements (PEs). The library of PEs contains all of the user scripts and
utilities that will be used by the executable of Job0. These PEs have been developed
by the scientists themselves in Python, and they can be scripted and chained into
pipelines. They consist of streaming operators that operate on units of data in a
stream and avoid moving streams in and out of the cluster. Moreover, the PEs are
capable of extracting all of the metadata information and provenance traces related
to their execution (Spinuso 2013). For instance, the MPI applications are also
launched from within a PE, in order to capture relevant information. This multi-
layered strategy to the workflow speci
cation allows the extraction and storage of
fine-grained lineage data at runtime. The storage system is based on a document
store exposed via a provenance web API (Davidson 2008). The coverage of the
current provenance model can be considered to be conceptually compliant with the
W3C-PROV 6
recommendation.
6
http://www.w3.org/TR/2013/REC-prov-dm-20130430 .
Search WWH ::




Custom Search