Introduction - User-Level Workflow Design: A Bioinformatics Perspective

Information Technology Reference

In-Depth Information

workflow developers from having to care how a processing step is performed -

a paradigm also promoted by the vision of grid computing, where computing

resources are supposed to be transparently allocated from wherever capac-

ity is available” [128]. That is, the ideal workflow system should be capable

of transparently choosing one service out of a group of services that pro-

vide equivalent functionality, without bothering the user with the technical

differences.

Semantic Handling of Data: Compatibility

Numerous data formats have been developed by the scientific communities,

reflecting various applications and technical requirements. Their use is con-

tinued also when the tools are provided as remote services, meaning that

workflows often have to deal with heterogeneous and incompatible data for-

mats. In fact, the heterogeneous and incompatible data formats that are in

use constitute one of the main obstacles to service composition and tool in-

teroperation [127]. For instance, there are around 20 common formats for

biological sequences alone, and, even more complicated, many available tools

and databases use tool-specific ASCII or binary formats rather than one of

the more or less common formats. What is more, in the technical terms of the

service interfaces, the textual formats are too often only classified as “strings”,

which is neither apt for reasoning about type compatibility nor does it help

users to work with them. Accordingly, workflow systems have to address how

to deal with the numerous different data types in a more satisfying way.

In principle, there are two possibilities of how to improve the handling of

heterogeneous and incompatible data formats:

1. Standardization , that is, introduction of a homogeneous system of more

specific data formats. This approach has been taken by several standard-

ization efforts in all application domains. However, a homogeneous stan-

dard technology that incorporates all data types is hard to achieve. And

even if standards for parts of the data type “jungle” are established, it

is impossible to change all the already existing software accordingly in

order to thoroughly replace all the historically grown formats.

2. Automatic adaptation by adding comprehensive annotations in terms of

semantic metadata to the existing data types [301], and using small ser-

vices that simply perform conversions from one data format into another

(so-called “shim services”) for achieving compatibility. This approach is

indeed more pragmatic than solely striving for standardization, as the

annotation is less invasive and can be applied to any resource at any

time.

This way, if standards exist, they can (and should) still be used, but

their combination with non-standard data types is also managed.

In analogy to the domain-specific service classifications that have been

outlined in the previous section, a detailed semantic description of the data

Search WWH ::

Custom Search

Home