Database Reference
In-Depth Information
complete simulation output (as a single file) can be sent to the archive system.
Thus, the automated solution puts files into appropriately sized chunks while
taking care of other requirements, for example, ensuring that all data for one
timestep goes into the same chunk. * Finally, recording the data provenance of
all generated data becomes increasingly important as the size and complexity
of the output grows. For example, from an automatically generated diagnostic
image, a scientist must be able to easily find the output of the simulation
corresponding to the visualization. Tools can greatly help with transferring
the relevant data to the scientist's host machine (which could be at a remote
site) provided that the above simulation management workflow records the
necessary data lineage of all operations.
Pipeline Parallel Processing. An important feature of the Kepler en-
vironment is its support for the dataflow process network 19 , 20 model of com-
putation, implemented via the Process Network (PN) director. 16 Using the
PN director, all actors are running continuously in separate threads, waiting
for input to be processed immediately. Each pipeline in the above workflow
is therefore processing a stream of data items in pipeline-parallel mode. For
example, since XGC1 outputs diagnostic data into three NetCDF files at each
timestep, plots can be created for one file, a second file is being used in a
merge operation, and a third file is being transferred. In a typical production
run scenario, XGC1 outputs a new timestep every 30 seconds. The time to get
one file through the processing pipeline includes the time for recognizing its
presence, the transfer time, and the execution time of the plot generation job
on the processing cluster. If the workflow performed only one of these steps at
a time (e.g., as prescribed by the SDF director), the simulation would gener-
ate files faster than they could be processed. Due to the size of the 3D data in
the HDF5 pipeline and the longer transfer time of those files, the situation is
similar in this pipeline as well. Finally, the archiving process must obviously
work in parallel with the rest of the workflow, since it is a slow process in
itself. If the task and pipeline parallelism exhibited by the above workflow
is not enough to keep up with the flow of data, one can replicate individ-
ual actors on different compute nodes to process multiple data items at the
same time. Although the above workflow does not need to do this currently, a
more complex production workflow is in use for coupling other codes with the
XGC1 predecessor code (such as those described in Section 5), XGC0, 25 where
a parameter study has to be executed for each timestep of the simulation, and
that study is executed in this parallel mode.
Robustness of Workflows. There are two different but related aspects
of robustness that can occur in compute-intensive workflows: What happens
if the overall workflow execution fails and stops (e.g., at the workflow en-
gine level), and what happens if an individual task in the workflow fails? For
* An additional problem arises when data is generated faster than it can be archived. In this
case, an additional workflow step can be inserted that uses an auxiliary disk to queue the data,
decoupling the slow archival from the fast data generation.
Search WWH ::




Custom Search