Adaptive Execution of Scientific Workflow Applications on Clouds - Cloud Computing with e-Science Applications

Information Technology Reference

In-Depth Information

4.1 Introduction

Many e-science applications can be modeled as workflow applications. In

this programming model, scientific applications are described as a set of

tasks that have dependencies between them. Normally, this dependency is

expressed in the form of input and output (I/O) files. It means that, before

one task can execute, it needs the tasks it depends on to have completed

their execution and the files they generate to already be available as input.

Well-known application domains where workflow applications are used

include astrophysics, bioinformatics, and disaster modeling and prediction,

among others.

Scientists have been successfully executing this type of application on

supercomputers, clusters, and grids. Recently, with the advent of clouds,

scientists started investigating the suitability of this infrastructure for work-

low applications.

Clouds are natural candidates for hosting workflow applications. This is

because some of their core characteristics, such as rapid elasticity, resource

pooling, and pay per use, are well suited to the nature of scientific applica-

tions that experience variable demand, spikes in resource (i.e., of the central

processing unit [CPU], disk) utilization, and sometimes, urgency for genera-

tion of results. Furthermore, recent offerings of high-performance cloud com-

puting instances make it even more compelling for scientists to adopt clouds

as the platform of choice for hosting their scientific workflow applications.

The execution of workflow applications is a demanding task. Tasks, some-

times in the order of hundreds, need to have their execution coordinated.

They have to be submitted for execution in a specific virtual machine (VM),

and the required input files need to be made accessible for the application.

This may require the transfer of huge amounts of data between computing

hosts. Reception of user input, data transfers, task executions, and VMs can

fail; in this case, some action has to be carried out to reestablish the execution

of the application. Examples of such actions are retrying the data transfer,

rescheduling the task, or starting a new VM to execute the remaining tasks.

These activities are carried out by software called workflow management

systems (WfMSs). Examples of well-know, WfMSs are Pegasus [1], Taverna [2],

Triana [3], and Cloudbus Workflow Engine [4].

At the same pace that infrastructures and platforms evolve, so do the sci-

entific applications using such infrastructures and platforms. The amount of

data generated by scientific experiments is reaching the order of terabytes per

day, and huge capacity is required to process this data to enable scientific dis-

coveries. Therefore, WfMSs also need to evolve to support huge data sets and

the complex analytics required to extract useful insights from the generated

data. Even more important, if data are continuously generated, WfMSs need

to support real-time capabilities. This has to be achieved at the same time that

other nonfunctional requirements, such as data privacy, are enabled.

Cloud Computing with e-Science Applications

Search WWH ::

Custom Search

Home