Introduction - User-Level Workflow Design: A Bioinformatics Perspective

Information Technology Reference

In-Depth Information

be completely automated and seamlessly integrated into the overall anal-

ysis process, which leads to much faster and significantly less error-prone

execution of standard data processing tasks. Furthermore, a large number

of repeated runs of the same experiment may reveal information about

possible experimental errors.

Scientific workflow systems (cf., e.g., [27, 106, 355] for surveys) support

and automate the execution of error-prone, repetitive tasks such as data

access, transformation, and analysis. In contrast to manual execution of com-

putational experiments (i.e., manually invoking the single steps one after

another), creating and running workflows from services increases the speed

and reliability of the experiments:

•

Workflows accelerate analysis processes significantly. The difference be-

tween manual and automatic execution time increases with the complex-

ity of the workflow and with the amount of data to which it is applied. As

manual analyses require the full attention of an (experienced) human user,

they are furthermore expensive in the sense that they can easily consume

a considerable amount of man power. For instance, assume a workflow

that needs 5 minutes to perform an analysis which requires 20 minutes

when carried out manually. When applied to 100 data sets, it runs for

8:20 h, while the human user would be occupied for 33:20 h, which cor-

responds to almost a man-week of work. What is more, the automatic

analysis workflows run autonomously in the background, possibly also

over night, so that the researcher can deliberately focus on other tasks in

the meantime.

•

Workflows achieve a consistent analysis process. By applying the same

parameters to each data set, they directly produce comparable and repro-

ducible results. Such consistency can not be guaranteed when the analysis

is carried out repeatedly by a human user, who naturally gets tired and

inattentive when performing the same steps again and again. When the

analyses are carried out by different people, the situation gets even worse,

as achieving consistent behavior of different users is even more dicult to

achieve.

Focusing on their actual technical realization, [193] describes the design

and development of scientific workflows in terms of a five-phase life cycle (cf.

Figure 1.5): Starting point is the formulation of a scientific hypothesis that

has to be tested or specific experimental goals that have to be reached. In

the subsequent workflow design phase the corresponding workflow is shaped.

Services and data must be put together to form a workflow that addresses

the identified research problem. It is also possible that (parts of) existing

workflows can be re-used or adapted to meet the needs of the new workflow.

The workflow preparation phase is then concerned with the more technical

preparations (like, e.g., specific parameter settings or data bindings) which

are the prerequisites for workflow enactment in the execution phase. Finally

there is a so-called post-execution analysis phase, meaning the inspection and

User-Level Workflow Design: A Bioinformatics Perspective

Search WWH ::

Custom Search

Home