Database Reference
In-Depth Information
whether the simulation is progressing correctly. The simulation itself executes
on a dedicated supercomputer ( primary cluster ) at Oak Ridge National Lab-
oratory (ORNL), while a secondary cluster computer at ORNL is used for
on-the-fly analysis of the simulation run on the primary cluster.
The first pipeline (shown in the center of Figure 13.1) performs the
NetCDF file processing portion of the monitoring workflow. This pipeline
starts by checking the availability of NetCDF files. As each such file grows
(they are extended after every diagnostic period), the workflow performs split
(taking the most recent data entry), transfer , and merge operations on re-
cent data to mirror XGC1's output on the secondary analysis cluster e-
ciently. Finally, images are generated using xmgrace * for all variables in the
output for each diagnostic time step and placed into a remote directory where
the scientist can browse them via the Web-based dashboard application 25
(cf. Section 13.5). The split and merge operations are executed on the lo-
gin nodes of the primary simulation machine and on the secondary analysis
cluster, respectively. To make the plots, however, a job has to be submit-
ted on the secondary cluster for each file in each step. Although one such
job is small—lasting only for a couple of seconds—there is almost always
one running; this would typically overload the login node of the primary
cluster.
The second pipeline (bottom of Figure 13.1) performs the BP-HDF5 pro-
cessing . This pipeline's role is similar to the NetCDF pipeline, but with the
following differences. For each step, XGC1 creates new BP files (a custom
b inary- p acked format); hence, there are no split and merge steps when trans-
ferring them to the secondary processing site. The BP files are converted to
HDF5 using an external code, and then images are created for all 2D slices
of the 3D data stored in those files using an AVS/Express dataflow network.
For this purpose, the pipeline starts AVS/Express as a remote job on the
secondary cluster and then makes image-creation requests to it as a (private)
service.
This workflow uses a set of fine-grained job-control steps provided by Ke-
pler for calling AVS/Express. The workflow waits until the AVS/Express job
is started on its execution host, performs the other tasks while the job is run-
ning, and stops the job at the end of the processing. The individual steps in
Figure 13.1 are workflows themselves (i.e., subworkflows, or composite actors
in Kepler terminology), implementing special tasks. One such subworkflow
is the archival step in the HDF5 pipeline, which assembles files into large
chunks and stores them in a remote mass-storage system. The steps used in
this workflow are described further in Podhorszki et al. 26
* A tool for gr aphing, a dvanced c omputation and e xploration of data; see http://plasma-
gate.weizmann.ac.il/Grace/
http://www.avs.com/software/soft t/avsxps.html
Search WWH ::




Custom Search