Phylogenetic Analysis Workflows - User-Level Workflow Design: A Bioinformatics Perspective

Information Technology Reference

In-Depth Information

data type classifications using terms like Sequence alignment (nucleic acid)

or Sequence record (protein) .

Services

Table 3.1 lists the services that are relevant for the following examples, along

with their input and output data types. The set of input types contains

all mandatory inputs (i.e., optional inputs are not considered), while the

set of output types contains all possible outputs. Note that the service in-

terface definitions only consider the data that is actually passed between

the individual services, that is, input parameters that are merely used for

configuration purposes are not regarded as service inputs. The table com-

prises only 23 of the more than 430 services in the complete domain model.

They provide functionality such as for the creation of molecular sequences

( makenucseq , makeprotseq and ehmmemit ), for basic processing of sequence

data (e.g. trimseq and transeq ), for phylogenetic analyses like alignments

and phylogenetic tree construction (e.g. emma , fdnacomp ), and for phyloge-

netic tree visualization ( fdrawtree , fdrawgram ).

Constraints

Initially, no domain constraints were defined for the EMBOSS domain model

in order to maintain its full potential for experimentation. Later, some of the

constraints that arose from the experimentation with the domain model that

is described in the following were applied as domain-wide constraints. As the

EMBOSS services constitute a really multi-purpose domain model (especially

in contrast to the scenarios that are discussed in the next chapters), problem-

specific constraints that are defined at workflow design time are more likely to

be used.

3.3.3 Exemplary Workflow Composition Problem

When developing bioinformatics analysis workflows, users often have a clear

idea about the inputs and final results, while their conception of the process

that actually produces the desired outputs is only vague. Figure 3.12 (top)

shows a simple loosely specified phylogenetics analysis workflow that reflects

this starting point of workflow design: it begins with generating a set of ran-

dom nucleotide sequences (using the EMBOSS service makenucseq ) and ends

with drawing and displaying a tree image (using fdrawtree and the viewer

SIB of the jETI plugin), respectively. The first two SIBs are connected by

a loosely specified branch (colored red). Note that the makenucseq service

is used at this stage of the workflow design only to express the frame con-

ditions in a convenient fashion: before the developed workflow would finally

be released, this SIB would be replaced by a service that reads a meaningful

nucleotide sequence from, for instance, a database or a file. The synthesis

Search WWH ::

Custom Search

Home