High-Performance Phylogeny Reconstruction Under Maximum Parsimony - Parallel Computing for Bioinformatics and Computational Biology

Biomedical Engineering Reference

In-Depth Information

16.3 EXACT MP: PARALLEL BRANCH AND BOUND

Although it takes polynomial time to compute the tree cost by Fitch's method [16],

it is still very time-consuming to compute the exact MP by exhaustive search due

to the enormous size of search space. Hence we use B&B to prune the search space

in phylogeny reconstruction [25, 26]. The underlying idea of the B&B algorithm is

successive decomposition of the original problem into smaller disjoint subproblems

and pruning subproblems whose lower bound is greater than the upper bound until

all optimal solutions are found.

16.3.1 Basic Issues in B&B Approach

Our B&B approach has five aspects that affect the performance of the algorithms:

branching scheme, search strategy, lower bounding function, initial global upper

bound, and the data structure. We will discuss these five aspects in the following

sections.

16.3.1.1 Branching Scheme The branch scheme decides how to decompose

a subproblem in the search space. Here, each subproblem is associated with a partial

tree and the aim is to find the exact MP score among those trees built from the partial

tree. Our branching scheme employs the same mechanism to generate all possible

unrooted binary trees for a given set of taxa. Consider the unrooted tree for three taxa.

The remaining n

3 taxa are added to the tree in stepwise fashion as described in

Section 16.2.3. Each new position for taxon i of the partial tree is considered a sub-

problem. Thus, a subproblem associated with the original partial tree is decomposed

into a set of disjoint subproblems, each associated with a new partial tree.

−

16.3.1.2 Search Strategy The search strategy decides which of the currently

open subproblems to be selected for decomposition. The two strategies most com-

monly used are depth-first search (DFS) and best-first search (BeFS). DFS is a

space-saving strategy and BeFS is targeted more toward a better global upper bound.

In the case when the initial global upper bound obtained by heuristic approaches is

exactly optimal or very close to exact optimal value, there is no significant difference

in the number of examined subproblems between DFS and BeFS search. Therefore

DFS has more advantage for reasons of space efficiency. As our experiment shows

that heuristics can provide a very good solution, we employ DFS as our primary B&B

search strategy and adopt BeFS as a secondary strategy to break ties.

16.3.1.3 Lower Bounding Function of the Subproblems Hendy and

Penny [24] use the cost of the associated partial tree as the lower bound of a subprob-

lem. Purdom et al. [27] used the sum of the single-column discrepancy and the cost

of the associated tree as the lower bound. For each column (character), the single-

column discrepancy is the number of states that do not occur among the taxa in the

associated tree but only occur among the remaining taxa. We employ Purdom's lower

bounding function as it is much tighter than the one described by Hendy and Penny.

Search WWH ::

Custom Search

Home