Information Technology Reference
In-Depth Information
problem that is defined by this loose specification is to find a sequence of ser-
vices that takes makenucseq 's output (a nucleotide sequence) as input, and
produces fdrawtree 's input (a phylogenetic tree) as output.
Applying the simplest synthesis process (simply returning the first solu-
tion that is found) to this specification results in the first of the possible
concretizations that are shown in the lower part of Figure 3.12: A single call
to emma (an interface to ClustalW), which produces a phylogenetic tree in
addition to a multiple sequence alignment, solves the synthesis problem. How-
ever, there are also reasonable solutions to the synthesis problem that do not
only contain a single phylogenetic tree construction service, but furthermore
comprise varying numbers of, for instance, sequence editing, reformatting or
preprocessing steps that define alternative analysis workflows for the same
input/output specification.
The lower part of the figure shows four more examples of concretizations
that may result from workflow synthesis for the loosely specified branch shown
above if longer solutions are also considered: The second example consists of
acallto transeq (translating the input nucleotide sequence into a protein
sequence) followed by a call to eomega (an interface to the ClustalO protein
alignment algorithm). In the third example, the reverse complement of the
input sequence is built ( revseq ) and then used for phylogenetic tree construc-
tion with fdnacomp . In the the fourth example the sequences are translated
into protein sequences ( transeq ), which are then aligned via ehmmbuild and
used for phylogenetic tree estimation with fprotpars . The last example so-
lution is a similar four-step sequence where an additional sequence is pasted
into the input sequences ( pasteseq )andwhere fproml is used instead of
fprotpars for the tree construction. Since EMBOSS provides various tools
for phylogenetic tree construction as well as for the different sequence pro-
cessing tasks, the solutions contained in the figure are by far not the only
possible ones.
Accordingly, it is desirable to let the synthesis return further solutions in or-
der to explore the possibilities that the domain model provides. When letting
the synthesis perform a “naive” search in the synthesis universe (i.e., search-
ing for all possible solutions considering only the input/output specification
defined by the loosely specified branch) however, the algorithm encounters
more than 1,000,000 results (the default limit of the search) for the synthesis
problem already in search depth 4. While it is in principle possible to increase
the limit and let the algorithm proceed to greater search depths and find fur-
ther solutions, such a large number of solutions is not manageable for the user
anyway. Moreover, although millions of solutions are easily possible with the
described domain model, they are not necessarily desired or adequate . Hence,
it is desirable to influence the synthesis process so that it returns less, but
more adequate solutions.
The next section demonstrates in greater detail how “playing” with syn-
thesis configurations and constraints helps mastering the enormous workflow
potential. Therefore, it describes a simple but effective solution refinement
 
Search WWH ::




Custom Search