Scientific Process Automation and Workflow Management - Scientific Data Management

Database Reference

In-Depth Information

this information, Wings generates a workflow instance that specifies the com-

putations (but not where they will take place) and the new data products.

For all the new data products, it generates metadata attributes by propagat-

ing metadata from the input data through the descriptions and constraints

specified for each of the components.

13.4.2 Mapping Workflows to Resources

It is often the case that at the time the workflow is being designed, the target

resources are yet to be chosen. Workflow mapping refers to the process of

generating an executable workflow based on a resource-independent workflow

description sometimes called an abstract workflow. In some cases the user per-

forms the mapping directly by selecting the appropriate resources. In other

cases, the workflow system performs the mapping.

Depending on the underlying execution model of stand-alone applications,

or individual services, different approaches are taken to the mapping pro-

cess. In the case of service-based workflows, mapping consists of finding and

binding to services appropriate for the execution of a high-level functionality.

Service-based workflows also can consider quality of service requirements when

performing the mapping. In the case of workflows composed of stand-alone

applications, the mapping not only involves finding the necessary resources to

execute the computations and perform various optimizations, but may also

include modifying the original workflow.

Some systems such as Taverna rely on the user to make the choice of re-

sources or services. In the case of Taverna, the user can provide a set of

services that match a particular workflow component, so if errors occur, an

alternate service can be automatically invoked. The newer versions of Taverna

will include late service-binding capabilities.

Kepler, on the other hand, allows the user to specify resource bindings

through its distributed computation configuration system. The user designs

the workflow in a manner that indicates which components are compute in-

tensive and should be distributed across remote computational resources. The

user then is presented with a dialog listing available compute resources, which

can include both other Kepler peers and remote Kepler slaves running on com-

puting clusters (see example in Figure 13.2). The user selects which set should

be used for the execution, and the Kepler execution engine then determines a

schedule for data transfer and execution of jobs based on the execution model

used in the abstract workflow model. In addition, Kepler can be used to con-

figure and submit jobs to a variety of other grid-based computing systems,

including Griddles, 50 Nimrod, 51 and other systems.

Triana is able to interface to a variety of execution environments using

the GAT (Grid Application Toolkit) 34 for task-based workflows and the GAP

(Grid Application Prototype) for service-based workflows. In the case of a

service-based workflow, a user can provide the information about the services

to invoke (or locate them via a repository). Alternatively, a user can create a

Scientific Data Management

Search WWH ::

Custom Search

Home