Information Technology Reference
In-Depth Information
4.1 Introduction
Many e-science applications can be modeled as workflow applications. In
this programming model, scientific applications are described as a set of
tasks that have dependencies between them. Normally, this dependency is
expressed in the form of input and output (I/O) files. It means that, before
one task can execute, it needs the tasks it depends on to have completed
their execution and the files they generate to already be available as input.
Well-known application domains where workflow applications are used
include astrophysics, bioinformatics, and disaster modeling and prediction,
among others.
Scientists have been successfully executing this type of application on
supercomputers, clusters, and grids. Recently, with the advent of clouds,
scientists started investigating the suitability of this infrastructure for work-
low applications.
Clouds are natural candidates for hosting workflow applications. This is
because some of their core characteristics, such as rapid elasticity, resource
pooling, and pay per use, are well suited to the nature of scientific applica-
tions that experience variable demand, spikes in resource (i.e., of the central
processing unit [CPU], disk) utilization, and sometimes, urgency for genera-
tion of results. Furthermore, recent offerings of high-performance cloud com-
puting instances make it even more compelling for scientists to adopt clouds
as the platform of choice for hosting their scientific workflow applications.
The execution of workflow applications is a demanding task. Tasks, some-
times in the order of hundreds, need to have their execution coordinated.
They have to be submitted for execution in a specific virtual machine (VM),
and the required input files need to be made accessible for the application.
This may require the transfer of huge amounts of data between computing
hosts. Reception of user input, data transfers, task executions, and VMs can
fail; in this case, some action has to be carried out to reestablish the execution
of the application. Examples of such actions are retrying the data transfer,
rescheduling the task, or starting a new VM to execute the remaining tasks.
These activities are carried out by software called workflow management
systems (WfMSs). Examples of well-know, WfMSs are Pegasus [1], Taverna [2],
Triana [3], and Cloudbus Workflow Engine [4].
At the same pace that infrastructures and platforms evolve, so do the sci-
entific applications using such infrastructures and platforms. The amount of
data generated by scientific experiments is reaching the order of terabytes per
day, and huge capacity is required to process this data to enable scientific dis-
coveries. Therefore, WfMSs also need to evolve to support huge data sets and
the complex analytics required to extract useful insights from the generated
data. Even more important, if data are continuously generated, WfMSs need
to support real-time capabilities. This has to be achieved at the same time that
other nonfunctional requirements, such as data privacy, are enabled.
Search WWH ::




Custom Search