Biology Reference
In-Depth Information
it exposes a number of challenges such as how to organize huge datasets and
coordinate distributed execution. For these challenges, a plethora of technologies
and innovations have come together to enable e-Science (Foster and Kesselman
2006 ). Nowadays, complex scientific experiments designed following the e-Science
paradigm are preformed using geographically distributed instruments, data and
computing resources. The newly designed scientific experiments are costly, time-
consuming, and multidisciplinary. Complex scientific experiments not only require
access to geographically distributed hardware and software resources, but also
extensive support to foster best practices, dissemination, and re-use.
Recently, Scientific Workflow Management Systems (SWMS) have become part
of the science infrastructure in realizing e-Science, owing to their intuitive approach
in prototyping experiments while concealing the complexity of the underlying
middleware. SWMS are also instrumental in research collaborations since knowl-
edge about experiments and data is easily shared through systems. This paradigm of
designing, executing and sharing experiments enables scientists to focus on problem
solving within their domain whilst intricate knowledge about underlying resources
and workflow execution is hidden behind the SWMS. In essence, SWMS strive
to bridge the knowledge gap between computational sciences and the myriad of
distributed computing technologies. To date, many workflow systems have been
developed and vary considerably in terms of workflow modeling, scheduling and
targeted resources (Chin et al. 2002 ; McClatchey and Vossen 1997 ) . The central
component in a SWMS is the workflow. A work fl ow can be described as a connected
graph which abstractly represents the flow of an experiment whereby vertices
represent the activities and the edges represent dependencies between activities.
The graph orchestrates the execution of such activities across the needed resources
according to the application flow description.
New technologies such as grids and, recently, clouds allow the coordination and
sharing of unprecedented quantities of geographically distributed computing and
storage power by groups of trusted users within Virtual Organizations (Pang 2001 ) .
Such environments have made it possible to design and build global distributed
collaborations involving large numbers of scientists and resources, and make data
and computing-intensive scientific experiments feasible (Hey and Trefethen 2002 ) .
Within the e-Science community workflow management systems have been adopted
as the main approach to designing and simulating complex systems (Chin et al.
2002 ; McClatchey and Vossen 1997 ) . A Scienti fi c Work fl ow Management System
explicitly models the dependencies between scientific experiment processes.
This chapter describes a way to build a workflow management system for
e-Science which provides support for the different phases of the lifecycle of a
typical e-Science experiment. The presented results originate from the Virtual
Laboratory for e-Science (VL-e) project, 1 which aims is to realize an e-Science
framework where scientists from different domains can share their knowledge
and resources, and perform domain-specific research. In this project complex
1 www.vl-e.nl
Search WWH ::




Custom Search