Database Reference
In-Depth Information
provides a form of documentation that can be used to validate and
interpret results produced by (often complex) scientific processes.
Scientific workflow systems often can
optimize
and then more
e
ciently
execute
scientific processes, for example, by exposing and exploiting var-
ious forms of parallelism inherent in data-driven scientific processes, as
well as by employing other techniques for ecient resource management.
Workflow environments encourage the
reuse
of knowledge artifacts (ac-
tors, workflows, etc.) developed when automating a scientific process,
both within and across disciplines.
13.3 Case Study: Fusion Simulation Management
We now present a detailed case study to make the previously discussed notions
more concrete. We chose a simulation management workflow as our example
because it exhibits a number of challenging issues typically not found in other
types of scientific workflows. In our terminology, the workflow is a
resource-
oriented
,
production
workflow. The main scientific computations (the fusion
simulation) are performed on a remote supercomputer cluster, while the man-
agement workflow can be executed on the scientist's desktop. The overall
computation managed by the workflow is both
data intensive
and
compute
intensive
; involves pipeline parallelism over a stream of data or reference to-
kens
*
; and is responsible for job management, file transfers, and data archiv-
ing. Such workflows have been called “plumbing” workflows due to their focus
on explicitly dealing with underlying resources (which the end-user scientist
prefers not to deal with). From the scientist's point of view, the primary task
is to observe and analyze the simulation results as soon as possible. To un-
derstand this challenge, we first describe briefly the physics problems studied
in the simulations.
The Center for Plasma Edge Simulation (CPES) is a Deparment of En-
ergey SciDAC project
23
requiring close collaboration between physicists, ap-
plied mathematicians, and computer scientists. Together, these researchers
have developed a complete plasma fusion simulation, called XGC1,
24
that
runs in a high-performance computing environment. The computational sci-
entists use this simulation to study the behavior of hot plasma in a tokamak-
type fusion reactor. The central issue under study is as follows. Within a
fusion reactor, if the hot edge of the plasma is allowed to contact the re-
actor wall in an uncontrolled way, it can sputter the wall material into
the plasma, degrade or extinguish the fusion burn, and shorten the wall
*
A
reference token
is a “logical pointer” to a data object, for example, a file name.