High-Performance Phylogeny Reconstruction Under Maximum Parsimony - Parallel Computing for Bioinformatics and Computational Biology

Biomedical Engineering Reference

In-Depth Information

once is said to be a nonsingleton state . A site with at most one nonsingleton states is

said to be a parsimony-uninformative site because the state changes at such kind of a

site can always be explained by the same number of substitutions in all topologies. At

the lower levels of B&B phylogeny reconstruction, only a few sites are parsimony-

informative. However, with the addition of taxa, many sites turn from parsimony-

uninformative to parsimony-informative. Hence, we may compute at which level

a site turns into parsimony-informative, then reorder sites so that at each level all

the parsimony-informative sites are kept in a contiguous segment of memory. By

reordering sites, not only the computation on parsimony-uninformative sites is saved,

but also the ratio of cache misses is greatly reduced.

16.3.3 A Fast Algorithm to Compute Tree Length

In a B&B search, an enormous number of trees must be evaluated. Given an original

tree, how can we compute the scores of each new tree generated by adding a taxon

in the original one? As described in Section 16.2.1, Fitch's method involves one

bottom-up pass and one top-down pass of the tree. Each pass computes a set of

states for each internal node by different rules, the states obtained in the first pass are

preliminary states and the states obtained in the second pass are final states . Goloboff

[29] proposed a method to preprocess the original tree in two passes, which takes

constant time to compute the score for each new tree. Gladstein [30] described an

incremental algorithm based on preliminary state sets obtained from Fitch's first pass.

In practice, Goloboff's method works better than that of Gladstein's. We developed

an approach that requires preprocessing the original tree in one pass and for each new

tree it takes constant time to compute the score.

Our approach is similar to Fitch's algorithm. Our first pass is identical to Fitch's

first pass. However, our second pass uses the rules of Fitch's first pass — not Fitch's

second pass rules. We obtain a set of states for each edge, which are the preliminary

states of the root. Therefore, when inserting a new taxon in the original tree at an

edge, we only compare the states of the new taxon and the states of that edge. We

obtain the same result as our bottom-up pass does on the new tree. If the result of the

new tree is kept in memory, when we decompose this new tree later, only the top-

down pass is required. Thus, in B&B search, our method saves one pass compared

with Goloboff's method. Besides the B&B search, our method can also be applied to

heuristic-search-based SPR and TBR rearrangement operations.

16.3.4 Parallel Implementation on Symmetric Multiprocessors (SMPs)

To utilize the computation power of parallel computers, we implement the B&B

phylogeny reconstruction on Cache-Coherent Uniform Memory Access (CC-UMA)

SMPs. In the parallel implementation, each processor selects different active nodes,

then processes the computation on it. We use the SPMD (Single Program, Multiple

Data) asynchronous model in which each processor works at its own pace and does

not have to wait at predetermined points for predetermined data to become available.

As the B&B search space tends to be highly irregular, any static distribution of search

Search WWH ::

Custom Search

Home