Database Reference
In-Depth Information
In this section, we see how provenance enables connecting research results
to high-level workflows in an astronomy application.
The application we look at is Montage. 45 Montage produces science-grade
mosaics of the sky on demand. This application can be structured as a work-
flow that takes a number of images, projects them, adjusts their backgrounds,
and adds the images together. A mosaic of 6 degrees square would involve pro-
cessing 1,444 input images, require 8,586 computational steps, and generate
22,850 intermediate data products. Executing the Montage workflow requires
potentially numerous distributed resources that may be shared by other users.
Because of the complexity of the workflow and the fact that resources often
change or fail, it is infeasible for users to define a workflow that is directly exe-
cutable over these resources. Instead, scientists use “workflow compilers” such
as Pegasus 18 , 42 (see Chapter 13) to generate the executable workflow based on
a high-level, resource-independent description of the end-to-end computation
(an abstract workflow ). This approach gives scientists a computation descrip-
tion that is portable across execution platforms and can be mapped to any
number of resources. However, the additional workflow mapping also increases
the gap between what the user defines and what is actually executed by the
system and thus complicates the interpretation of the results: The connection
between the scientific results and the original experiment is lost.
To reconnect the scientific results with the experiment, Miles et al. 19 and
Miles et al. 88 present a system for tracking the provenance of a mosaic back
to the abstract workflow that it was generated from. The system integrates
the PASOA 54 and Pegasus systems to answer provenance questions such as
what particular input images were retrieved from a specific archive, whether
parameters for the re-projections were set correctly, what execution platforms
were used, and whether those platforms included processors with a known
floating point processing error.
To accomplish this, each stage of the compilation from abstract workflow to
executable workflow is tracked in Pegasus. For example, one of Pegasus's fea-
tures is to select at which sites or platforms each computational step should be
executed. During the Pegasus compilation process, this information is stored
as process documentation within PASOA's provenance store. Additionally,
this information is linked to the subsequent compilation steps such as inter-
mediate data registration, and task clustering. Finally, during the execution of
the Pegasus-produced workflow, all execution information is stored and linked
to the workflow within the provenance store. Using this process documenta-
tion, a provenance graph of the resulting sky mosaic can be generated that
leads back to the specific site selected.
The availability of provenance in Montage enables astronomers to take ad-
vantage of workflow automation technologies while still retaining all the nec-
essary information to reproduce and verify their results. Outside of Montage,
provenance is an underpinning technology that allows for workflow automa-
tion, a technology necessary for other large-scale grid-based science applica-
tions, such as astrophysics. 89
Search WWH ::




Custom Search