Scientific Process Automation and Workflow Management - Scientific Data Management

Database Reference

In-Depth Information

13.3.2 Issues in Simulation Management

In a compute-intensive workflow, jobs are typically executed remotely from

the workflow execution engine, and thus the output of one job often must be

transferred to another host where subsequent jobs are executed. For eciency

reasons, files are directly transmitted between remote sites, while the work-

flow engine only “sees” reference tokens to the remote files. The XGC1 case

study falls into this category of workflows. It also belongs to a category of

data-intensive workflows in which the data produced during a supercomput-

ing simulation must be processed on the fly and as quickly as possible. This

scenario is typical of most scientific simulations that use supercomputers and

produce ever-larger amounts of data as the size and speed of the supercom-

puter clusters continues to increase.

Runtime Decision Support. The typical tasks that a computational sci-

entist performs during and after a simulation run are often tedious to perform

manually without automation support. For instance, to maintain high uti-

lization of supercomputing resources, it is essential to be able to detect and

halt a divergent simulation. Thus, in most scientific simulations, the status

of the computation must be regularly checked to ensure that it is not diverg-

ing given the initial input parameters. However, it can be dicult to check

the status of an executing simulation because typically the user has to log in

to the primary supercomputer cluster (since applications typically write data

to local disks) at regular intervals to analyze diagnostic values that reveal

errors in the input or simulation code. Moreover, simulations typically write

out other (more involved) diagnostic data such as physical variables or deriva-

tives of these variables, which must be plotted and analyzed. Although such

plots give deeper insight into the current state of the simulation, even more

information may be needed for monitoring and runtime decision support, for

example, the ability to visually analyze parts of the dataset written out by

the simulation. The latter operation usually cannot be done on the supercom-

puter's login node, however, which is one of the reasons for transferring data

to another, secondary, computer such as the scientist's desktop computer or a

dedicated visualization computer. Although not described in detail here, the

CPES project has automated these various tasks via a separate workflow that

greatly reduces the amount of manual work required of users by automatically

routing diagnostic information and data, and by displaying the appropriate

plots and visualizations on a Web-based dashboard.

Data Archiving. Another important task is the archiving of output data.

At present, it is sucient to archive data after the simulation run. In the near

future, however, it is anticipated that the largest simulations will create more

data in a single run than can fit onto the disk system of the supercomputers.

Therefore, files must be transferred to a remote mass storage system on the

fly and then removed from the local disk to make space for more data coming

from the simulation. There also is a requirement to create “archival chunks” of

an intermediate size; for performance reasons, neither individual files nor the

Scientific Data Management

Search WWH ::

Custom Search

Home