Information Technology Reference
In-Depth Information
CopyCat and AxParafit tools as well as additional
components that were necessary to implement the
system outlined in Section 3.1.
then be displayed and further analyzed via the
CopyCat evaluation window.
Within the context of an automated Grid-driven
simultaneous analysis of several distinct datasets
(and other potential script-based applications,
based on CopyCat), the program has been extended
by a command-line interface. As a side-effect, this
enables CopyCat users to speed-up certain analy-
ses by simply executing a specific command-line
call with a defined set of parameters (please refer
to the CopyCat manual for detailed information
on the command-line options).
CopyCat
Previous versions of CopyCat already provided
straight-forward GUI-based functionality for the
preparation and analysis of co-phylogenetic da-
tasets. The CopyCat GUI is implemented in Java
using the Standard Widget Toolkit (SWT). Upon
startup, the user can load the host and parasite trees
(represented in the standard Newick tree format:
http://evolution.genetics.washington.edu/phylip/
newicktree.html), together with a host-parasite
association list in a simple plain-text format that
contains one host-parasite association per line.
When starting an analysis, the user can now
utilize a new Grid interface that connects Copy-
Cat to the gridified program AxParafit. Instead
of directly calling the AxParafit executable, the
interface invokes a Perl script (AxParafit.pl)
which hides the Grid-related parts from the user
and CopyCat. By delegating the invocation pro-
cess to a script, dependencies between the user
front-end and the Grid software are minimized.
Thus, future modifications like the development
of a Web interface for job submissions (see Con-
clusion) or the usage of a different middleware
system are possible.
The AxParafit.pl script entirely manages the
execution of AxParafit on the Grid and provides
status updates to the standard output stream at the
same time. As CopyCat is listening to the output
stream of the external programs it invokes, it
also receives the status updates generated by the
aforementioned Perl-script and writes them to
the CopyCat log-message window, thus keep-
ing the user informed about the progress of Grid
jobs. Upon termination of the script, the output
of the Grid jobs (individual tests of host-parasite
associations), as well as the global significance
test results, are read by CopyCat. The results can
Application-Side Modifications
of AxParafit
As outlined in other section, the parallel MPI
implementation of AxParafit uses a simple master-
worker scheme. In order to devise a distributed ver-
sion of AxParafit we modified the code as follows:
initially, we appropriately modified the global test
of congruence in AxParafit to write an additional
file called “gridData.RUN-ID” where RUN-ID is
the output file name appendix for a specific analy-
sis that is passed to AxParafit via a command line
parameter (for details see the AxParafit Manual
at http://icwww.epfl.ch/~stamatak/). This file
contains the necessary data to make scheduling
decisions for the distributed computation of the n
individual tests of host-parasite associations, i.e.,
the number of jobs nz, e.g. Jobs=2000, and the
approximate execution time per job in seconds,
e.g., ComputeTime=10 . This data can then be
used to determine the level of granularity for in-
dividual Grid tasks since in the current example
the scheduling overhead induced by distributing
2,000 jobs of 10 seconds each, along with the
comparatively large input datasets on the Grid,
would be immense. We have, thus, extended the
implementation of the individual host-parasite
association tests in AxParafit by two additional
command line parameters -l (lower limit) and -u
(upper limit). These parameters allow for compu-
tation of several host-parasite associations in one
Search WWH ::




Custom Search