Scientific Process Automation and Workflow Management - Scientific Data Management

Database Reference

In-Depth Information

whether the simulation is progressing correctly. The simulation itself executes

on a dedicated supercomputer ( primary cluster ) at Oak Ridge National Lab-

oratory (ORNL), while a secondary cluster computer at ORNL is used for

on-the-fly analysis of the simulation run on the primary cluster.

The first pipeline (shown in the center of Figure 13.1) performs the

NetCDF file processing portion of the monitoring workflow. This pipeline

starts by checking the availability of NetCDF files. As each such file grows

(they are extended after every diagnostic period), the workflow performs split

(taking the most recent data entry), transfer , and merge operations on re-

cent data to mirror XGC1's output on the secondary analysis cluster e-

ciently. Finally, images are generated using xmgrace * for all variables in the

output for each diagnostic time step and placed into a remote directory where

the scientist can browse them via the Web-based dashboard application 25

(cf. Section 13.5). The split and merge operations are executed on the lo-

gin nodes of the primary simulation machine and on the secondary analysis

cluster, respectively. To make the plots, however, a job has to be submit-

ted on the secondary cluster for each file in each step. Although one such

job is small—lasting only for a couple of seconds—there is almost always

one running; this would typically overload the login node of the primary

cluster.

The second pipeline (bottom of Figure 13.1) performs the BP-HDF5 pro-

cessing . This pipeline's role is similar to the NetCDF pipeline, but with the

following differences. For each step, XGC1 creates new BP files (a custom

b inary- p acked format); hence, there are no split and merge steps when trans-

ferring them to the secondary processing site. The BP files are converted to

HDF5 using an external code, and then images are created for all 2D slices

of the 3D data stored in those files using an AVS/Express † dataflow network.

For this purpose, the pipeline starts AVS/Express as a remote job on the

secondary cluster and then makes image-creation requests to it as a (private)

service.

This workflow uses a set of fine-grained job-control steps provided by Ke-

pler for calling AVS/Express. The workflow waits until the AVS/Express job

is started on its execution host, performs the other tasks while the job is run-

ning, and stops the job at the end of the processing. The individual steps in

Figure 13.1 are workflows themselves (i.e., subworkflows, or composite actors

in Kepler terminology), implementing special tasks. One such subworkflow

is the archival step in the HDF5 pipeline, which assembles files into large

chunks and stores them in a remote mass-storage system. The steps used in

this workflow are described further in Podhorszki et al. 26

* A tool for gr aphing, a dvanced c omputation and e xploration of data; see http://plasma-

gate.weizmann.ac.il/Grace/

† http://www.avs.com/software/soft t/avsxps.html

Search WWH ::

Custom Search

Home