Database Reference
In-Depth Information
13.3.2 Issues in Simulation Management
In a compute-intensive workflow, jobs are typically executed remotely from
the workflow execution engine, and thus the output of one job often must be
transferred to another host where subsequent jobs are executed. For eciency
reasons, files are directly transmitted between remote sites, while the work-
flow engine only “sees” reference tokens to the remote files. The XGC1 case
study falls into this category of workflows. It also belongs to a category of
data-intensive workflows in which the data produced during a supercomput-
ing simulation must be processed on the fly and as quickly as possible. This
scenario is typical of most scientific simulations that use supercomputers and
produce ever-larger amounts of data as the size and speed of the supercom-
puter clusters continues to increase.
Runtime Decision Support. The typical tasks that a computational sci-
entist performs during and after a simulation run are often tedious to perform
manually without automation support. For instance, to maintain high uti-
lization of supercomputing resources, it is essential to be able to detect and
halt a divergent simulation. Thus, in most scientific simulations, the status
of the computation must be regularly checked to ensure that it is not diverg-
ing given the initial input parameters. However, it can be dicult to check
the status of an executing simulation because typically the user has to log in
to the primary supercomputer cluster (since applications typically write data
to local disks) at regular intervals to analyze diagnostic values that reveal
errors in the input or simulation code. Moreover, simulations typically write
out other (more involved) diagnostic data such as physical variables or deriva-
tives of these variables, which must be plotted and analyzed. Although such
plots give deeper insight into the current state of the simulation, even more
information may be needed for monitoring and runtime decision support, for
example, the ability to visually analyze parts of the dataset written out by
the simulation. The latter operation usually cannot be done on the supercom-
puter's login node, however, which is one of the reasons for transferring data
to another, secondary, computer such as the scientist's desktop computer or a
dedicated visualization computer. Although not described in detail here, the
CPES project has automated these various tasks via a separate workflow that
greatly reduces the amount of manual work required of users by automatically
routing diagnostic information and data, and by displaying the appropriate
plots and visualizations on a Web-based dashboard.
Data Archiving. Another important task is the archiving of output data.
At present, it is sucient to archive data after the simulation run. In the near
future, however, it is anticipated that the largest simulations will create more
data in a single run than can fit onto the disk system of the supercomputers.
Therefore, files must be transferred to a remote mass storage system on the
fly and then removed from the local disk to make space for more data coming
from the simulation. There also is a requirement to create “archival chunks” of
an intermediate size; for performance reasons, neither individual files nor the
Search WWH ::




Custom Search