Information Technology Reference
In-Depth Information
alignment) and about key numbers like the number of groups or the alignment
score. Finally, also the resulting multiple alignment is displayed.
In addition to using the interactive command line interface, it is possible to
run ClustalW in a non-interactive command line mode, that is, the input files
and parameters are already specified in the command line call, and the result
is printed on the console or written into a separate file. Altogether, ClustalW
provides a quite “atomic” (i.e. small and self-contained) functional unit with
well-defined inputs and outputs, which can be used easily in different contexts.
Therefore it is also often provided as web service, allowing easy programmatic
access also via the internet.
3.2 Variations of a Multiple Sequence Alignment
Workflow
This first example of bioinformatics workflows with Bio-jETI is woven around
multiple sequence alignments as described in Section 3.1.2. It is based on a
number of data retrieval and sequence analysis services that are provided by
the DNA Data Bank of Japan (DDBJ) [223, 165]. This example illustrates in
particular the agility of workflow design with Bio-jETI: workflows can easily
be modified, adapted, customized and tested in its graphical user interface,
and (parts of) workflows can be prepared and flexibly (re-) combined at
the user level according to current analysis objectives. The example also
demonstrates that user interaction can easily be included in the workflows
by using the SIB libraries that are shipped with the jABC framework.
The following gives an overview of the DDBJ services and describes the im-
plementation of a minimal alignment workflow in Bio-jETI, before presenting
a number of possible variations of the basic workflow. For a more elaborate
description of this example, the reader is referred to [172].
3.2.1 DDBJ Services
The DDBJ [145] is a major database provider in Asia that forms the Interna-
tional Nucleotide Sequence Database (INSD) collaboration [72] together with
the NCBI's GenBank [270] and the EBI's Ensembl [132]. Via its Web API [165]
the DDBJ provides Web Services for a variety of tasks. At the time of writing
this topic, 18 services (with a total of 124 individual operations) are available.
Of these, the following are used in the examples presented in the following:
ARSA is a DDBJ-specific keyword search system that supports simple and
complex searches against a number of databases.
GetEntry can be used to retrieve entries from a variety of common
databases.
Blast [29] searches databases for similar sequences.
ClustalW (cf. Section 3.1.3) computes multiple sequence alignments.
 
Search WWH ::




Custom Search