Lessons Learned - User-Level Workflow Design: A Bioinformatics Perspective

Information Technology Reference

In-Depth Information

a clearly defined, basic unit of functionality with deterministic input/output

semantics. Semantically complex services, for instance such with alternative

runtime behavior under different circumstances ( OR-semantics ), can not be

described and handled properly by the synthesis framework.

For example, consider the ClustalW service that is used in the GeneFisher-

P scenario (cf. Chapter 4, respectively). Using a set of molecular sequences

as input, it computes a multiple sequence alignment. More precisely, if the

input is a set of nucleic acid sequences, it computes a nucleic acid alignment,

and if the input is a set of amino acid sequences, the result is an amino

acid alignment. This alternative behavior according to the concrete nature of

the input can as such not be expressed in the synthesis framework. As this

distinction between the sequence types is not required in the scope of the

considered workflow scenario, however, there ClustalW is simply annotated

using a Sequence as input and producing a Sequence alignment (multiple) as

output (cf. Section 4.3.1).

Generally, however, treating services as multi-purpose services that can

be applied to different input data types often leads to a loss of precision, as

the output is typically described at an equally abstract level. That is, the

input data is “lifted” to a higher level from the perspective of the synthesis

algorithm. As a consequence, actually matching services are not recognized

any more. For example, the result of ClustalW as defined above is a (gen-

eral) multiple sequence alignment, regardless whether the input sequences

were nucleic acid or amino acid sequences. Hence, the synthesis is not able to

recognize the possibility to use the result as input for a special service that

works on multiple alignments of a particular sequence type, such as, for in-

stance, the fdnaml and fproml phylogenetic tree construction services from

the EMBOSS scenario (cf. Section 3.3.2).

An interesting feature of PROPHETS domain models in this regard is

the possibility to define multiple service interface descriptions for one and

the same underlying service. For instance, instead of contenting himself with

the vague ClustalW interface specification as described above, the domain

modeler can simply specify two distinct alignment services, say ClustalW AA

and ClustalW NA , of which one computes an amino acid alignment from a set

of amino acid sequences, and the other a nucleic acid alignment from nucleic

acid sequences. This polymorphism of services is in fact a very useful feature.

However, in the current implementation of the framework it often causes

ambiguities with respect to the actual configuration of the synthesized work-

flows. The reason is that the difference between these services are only visible

in the scope of the semantic domain model, but not at the level of the basic

SIB instances. Thus, having a SIB in the workflow model, the framework can

only guess which of the different service interfaces is to be applied.

Finally, the experience with the bioinformatics applications also showed

that it is adequate to include only the actual input/output data that “flows”

into/out of the service in the semantic service interface descriptions, but not

the configuration parameters. While it is in principle possible also to regard

User-Level Workflow Design: A Bioinformatics Perspective

Search WWH ::

Custom Search

Home