Adaptive Execution of Scientific Workflow Applications on Clouds - Cloud Computing with e-Science Applications

Information Technology Reference

In-Depth Information

Although this information is truth regardless of the specific infrastruc-

ture hosting the workflow application, even more complexity is added to

the system when the applications are executed in clouds. This is because

extra capabilities are required to enable the WfMS to select the right number

of resources of the right type so that the computational task is performed

within a user-defined time frame and budget.

As current WfMSs cannot support efficient and automated execution of

workflow in clouds that support adaptive execution, fault tolerance, and data

privacy, we developed extensions to a workflow engine [4] to support such

features. In this chapter, we detail the requirements of such a system, its

architecture, and the application scenario explored, along with an evaluation

of the system and a discussion of lessons learned during its development.

4.2 Workflow Applications

The workflow programming model is undoubtedly one of the most promi-

nent programming models in e-science, being used in a range of domains,

including bioinformatics, astrophysics, and disaster modeling, to name a

few. In this model, one application (job) is composed of a number of tasks

that have execution dependencies between them. Typically, the dependency

is related to I/O: One task depends on the output of another (or other) task(s)

as its input; therefore, it cannot be executed until such data are available

(normally, after the execution of the original task is completed).

Variations of the model exist in which the workflow also contains condi-

tional branches (i.e., particular tasks that compose the workflow may or may

not be executed depending on the results of previous tasks), loops (for which

execution of specific sections of the workflow is repeated), and when tasks

are allowed to start execution before predecessors complete execution.

Without loss in generality, a workflow application can be formally rep-

resented by a directed acyclic graph (DAG) whose vertices represent tasks

and the directed edges represent the dependencies between tasks: An edge

A → B indicates that task B depends on task A for its execution. Such a

representation of workflow applications is also known as DAG. A simple

workflow is depicted in Figure 4.1.

Traditionally, workflow applications have been extensively deployed in

high-performance infrastructures such as supercomputers and clusters [5].

When deployed on such infrastructures, emphasis was given in reducing the

execution time of the workflow by optimizing the utilization of the resources

available for the workflow. When grids became available, they were also

used for workflow execution [6, 7]. This added complexity to the schedul-

ing process because it was possible that resources available for execution

Search WWH ::

Custom Search

Home