Information Technology Reference
In-Depth Information
appropriate sequence data, allow for large-scale
phylogenetic analyses with several hundred or
thousand sequences (Stamatakis, 2006). Thus,
large-scale co-phylogenetic studies have, in prin-
ciple, become feasible. However, most common
co-phylogenetic tools or methods such as BPA,
Component, TreeMap, TreeFitter (cf. review in
Charleston, 2006) or Tarzan (Merkle, 2006) are
not able to handle datasets with a large number of
organisms or have not been tested in this regard
with respect to their statistical properties and
scalability. Faster methods based on topological
distances between trees, like, e.g., I cong (de Vienne,
2007) are even limited to the analysis of bijective
associations only. In this context bijectivity means
that each parasite can only be associated to one
single host, and vice versa. Therefore, there is a
performance and scalability gap between tools
for phylogenetic analysis and meta-analysis. The
capability to analyze large datasets is important to
infer “deep co-phylogenetic” relationships which
can otherwise not be assessed (Meier-Kolthoff et
al., 2007; Stamatakis et al., 2007). Deep relation-
ships are relationships that determine the extant
associations between parasite and host organisms
at a high taxonomic level, such as, e.g., families
and orders.
Parafit (Legendre, 2002) and the analogous
highly optimized AxParafit (Stamatakis et al.,
2007) program implement a statistical test to assess
hypotheses of global congruence between trees
as well as the impact of individual associations.
This test is based on the permutation of the entries
in the association matrix. The null hypothesis is
that the global similarity between the trees, or the
respective impact of an individual local association
on the similarity, is not larger than expected by
pure chance. Extensive simulations have shown
that the Parafit test is statistically well-behaved
and yields acceptable error rates. The method has
been successfully applied in a number of biologi-
cal studies (Hansen et al., 2003; Ricklefs et al.,
2004; Meinilä et al., 2004).
In addition, the type-II statistical error of
Parafit decreases with the size of the dataset (see
Legendre, 2002), i.e., this approach scales well
on large phylogenies of hosts and parasites in
terms of accuracy. The AxParafit program is a
highly optimized version of Parafit which yields
exactly the same results. The sequential version
of AxParafit is up to 67 times faster than the
original Parafit implementation, while the speedup
increases with increasing input size, caused by
higher cache efficiency. The speedup of AxParafit
has been achieved via low-level optimizations
in C, re-design of the algorithm, omission of
redundant code, reduction of memory footprint,
and integration of highly optimized BLAS (Basic
Linear Algebra Subroutines, http://www.netlib.
org/blas/) routines.
Earlier work describes these optimizations
together with a respective performance study.
Moreover, the program was used to conduct the
largest co-phylogenetic analysis on real-world
data to date. The underlying data were smut fungi
and their respective host plants (Stamatakis et al.,
2007). Smut fungi are parasitic mushrooms that
cause plant diseases. For economically important
hosts, such as barley and other cereals, smut fungi
can for instance cause considerable yield losses
(Thomas and Menzies, 1997).
Workflow of a Co-Phylogenetic
Analysis with CopyCat and AxParafit
In this section, we provide an outline of the
work-flow for a full co-phylogenetic analysis
using CopyCat(AxParafit). The input for a co-
phylogenetic analysis with CopyCat(AxParafit)
are the host and parasite phylogenies, that might
have branch lengths, depending on which method/
model was used to calculate the trees. The afore-
mentioned associations are represented as a plain
text file containing a list of sequence (organism)
name pairs of hosts and parasites, i.e., an adjacency
list. This input data representation is henceforth
also referred to as list of host-parasite associations.
Search WWH ::




Custom Search