Information Technology Reference
In-Depth Information
data type classifications using terms like Sequence alignment (nucleic acid)
or Sequence record (protein) .
Services
Table 3.1 lists the services that are relevant for the following examples, along
with their input and output data types. The set of input types contains
all mandatory inputs (i.e., optional inputs are not considered), while the
set of output types contains all possible outputs. Note that the service in-
terface definitions only consider the data that is actually passed between
the individual services, that is, input parameters that are merely used for
configuration purposes are not regarded as service inputs. The table com-
prises only 23 of the more than 430 services in the complete domain model.
They provide functionality such as for the creation of molecular sequences
( makenucseq , makeprotseq and ehmmemit ), for basic processing of sequence
data (e.g. trimseq and transeq ), for phylogenetic analyses like alignments
and phylogenetic tree construction (e.g. emma , fdnacomp ), and for phyloge-
netic tree visualization ( fdrawtree , fdrawgram ).
Constraints
Initially, no domain constraints were defined for the EMBOSS domain model
in order to maintain its full potential for experimentation. Later, some of the
constraints that arose from the experimentation with the domain model that
is described in the following were applied as domain-wide constraints. As the
EMBOSS services constitute a really multi-purpose domain model (especially
in contrast to the scenarios that are discussed in the next chapters), problem-
specific constraints that are defined at workflow design time are more likely to
be used.
3.3.3 Exemplary Workflow Composition Problem
When developing bioinformatics analysis workflows, users often have a clear
idea about the inputs and final results, while their conception of the process
that actually produces the desired outputs is only vague. Figure 3.12 (top)
shows a simple loosely specified phylogenetics analysis workflow that reflects
this starting point of workflow design: it begins with generating a set of ran-
dom nucleotide sequences (using the EMBOSS service makenucseq ) and ends
with drawing and displaying a tree image (using fdrawtree and the viewer
SIB of the jETI plugin), respectively. The first two SIBs are connected by
a loosely specified branch (colored red). Note that the makenucseq service
is used at this stage of the workflow design only to express the frame con-
ditions in a convenient fashion: before the developed workflow would finally
be released, this SIB would be replaced by a service that reads a meaningful
nucleotide sequence from, for instance, a database or a file. The synthesis
 
Search WWH ::




Custom Search