Information Technology Reference
In-Depth Information
Although this information is truth regardless of the specific infrastruc-
ture hosting the workflow application, even more complexity is added to
the system when the applications are executed in clouds. This is because
extra capabilities are required to enable the WfMS to select the right number
of resources of the right type so that the computational task is performed
within a user-defined time frame and budget.
As current WfMSs cannot support efficient and automated execution of
workflow in clouds that support adaptive execution, fault tolerance, and data
privacy, we developed extensions to a workflow engine [4] to support such
features. In this chapter, we detail the requirements of such a system, its
architecture, and the application scenario explored, along with an evaluation
of the system and a discussion of lessons learned during its development.
4.2 Workflow Applications
The workflow programming model is undoubtedly one of the most promi-
nent programming models in e-science, being used in a range of domains,
including bioinformatics, astrophysics, and disaster modeling, to name a
few. In this model, one application (job) is composed of a number of tasks
that have execution dependencies between them. Typically, the dependency
is related to I/O: One task depends on the output of another (or other) task(s)
as its input; therefore, it cannot be executed until such data are available
(normally, after the execution of the original task is completed).
Variations of the model exist in which the workflow also contains condi-
tional branches (i.e., particular tasks that compose the workflow may or may
not be executed depending on the results of previous tasks), loops (for which
execution of specific sections of the workflow is repeated), and when tasks
are allowed to start execution before predecessors complete execution.
Without loss in generality, a workflow application can be formally rep-
resented by a directed acyclic graph (DAG) whose vertices represent tasks
and the directed edges represent the dependencies between tasks: An edge
A  →  B indicates that task B depends on task A for its execution. Such a
representation of workflow applications is also known as DAG. A simple
workflow is depicted in Figure 4.1.
Traditionally, workflow applications have been extensively deployed in
high-performance infrastructures such as supercomputers and clusters [5].
When deployed on such infrastructures, emphasis was given in reducing the
execution time of the workflow by optimizing the utilization of the resources
available for the workflow. When grids became available, they were also
used for workflow execution [6, 7]. This added complexity to the schedul-
ing process because it was possible that resources available for execution
Search WWH ::




Custom Search