Parallel and Distributed Computation of Large Phylogenetic Trees - Parallel Computing for Bioinformatics and Computational Biology

Biomedical Engineering Reference

In-Depth Information

Initially, a compressed (zipped) alignment file is transferred to all workers, which

start with the computation of a local parsimony starting tree. The parsimony tree is

then returned to the master as in the parallel program.

However, the parallel and distributed algorithms differ in two important aspects

which have a positive impact on communication costs.

First, RAxML@home does not implement phase IIa but only phase IIb of the

parallel algorithm, to avoid frequent communication and frequent exchange of tree

topologies between master and workers.

Secondly, the lists containing the 20 best trees, irrespective of the number of work-

ers, are optimized locally at the workers after completion of subtree rearrangements.

The branch lengths of the trees in the list are optimized less exhaustively than in the

sequential and parallel program. After this initial optimization, only the best local tree

is thoroughly optimized and returned to the master.

This induces some computational overhead and a slower improvement rate of the

likelihood during the initial optimization phase (phase IIa of the parallel program)

but remains within acceptable limits.

A more detailed technical description of distributed RAxML-III can be found in

Ref. [35].

14.6 FUTURE DEVELOPMENTS

Future developments cover a wide range of issues such as appropriate visualization

tools for huge trees, further refinement of evolutionary models, as well as new algo-

rithms and computer systems for the inference of huge trees. Despite the fact that

visualization and modeling are very important issues, they cannot be discussed here

as they are outside the scope of this chapter. Sanderson and Driskell [36] provide a

more comprehensive review of the challenges for large phylogeny reconstruction.

Consequently, this final section covers only future algorithmic and technical

developments of high performance phylogenetic inference.

14.6.1 Algorithmic Developments

The main goal of future algorithmic improvements is to compute better trees, that

is, with higher accuracy, in less time than the currently fastest and most accurate pro-

grams. Given the at least quadratic complexity of search algorithms, one very evident

idea, at least for a computer scientist, to further accelerate tree building algorithms

consists in devising a divide-and-conquer approach.

In the concrete case, this means that the alignment has to be (intelligently)

split up into overlapping subalignments. Thereafter, a subtree is inferred for each

subalignment by application of some fast base method, for example, PHYML or

RAxML-III. Finally, the subtrees are merged into a single comprehensive tree (also

called supertree) by appropriate supertree methods. As the subtrees overlap, that is,

share some common organisms, they provide information to the supertree method

Search WWH ::

Custom Search

Home