High-Performance Phylogeny Reconstruction Under Maximum Parsimony - Parallel Computing for Bioinformatics and Computational Biology - page 384

Biomedical Engineering Reference

In-Depth Information

( a )

Mean relative subproblem size

100

Rec−DCM3

DCM3

DCM2

90

80

70

60

50

40

30

20

10

0

1

2

3

4

5

6

7

8

9

10

Dataset#

( b )

Number of subproblems

15

Rec−DCM3

DCM3

DCM2

10

5

0

1

2

3

4

5

6

7

8

9

10

Dataset#

Figure 16.7 Comparison of DCM2, DCM3, and Recursive-DCM3 decompositions. DCM2

decompositions on datasets 5 to 10 could not be computed due to memory limitations.

16.4.1.4 Subtree Construction and Assembly Once the data set is decom-

posed into overlapping subsets A 1 , A 2 , ... , A m (for us, m

4 is typical), subtrees are

constructed for each subset, A i , using the chosen “base method,” and then combined

using the Strict Consensus Merger [31, 32] to produce a tree on the combined data

set. The proof that the resulting tree is accurate (i.e., agrees, with high probability

and in the limit, with the unknown underlying “true tree”) is similar to the argument

shown in Ref. [31].

Our Rec-I-DCM3 algorithm takes as input the set S

≤

of n aligned

biomolecular sequences, the chosen base method, and a starting tree T . In our experi-

ments, we have usedTNT (with default settings) as our basemethod, as it is the hardest

to improve (in comparison, the PAUP* implementation of the parsimony ratchet [36]

is easier to improve). Our algorithm produces smaller subproblems by recursively

applying the centroid-edge decomposition until each subproblem is of size at most

k . The subtrees are then computed, merged, and resolved (from the bottom-up, using

={

s 1 , ... , s n }

Next Page

Parallel Computing for Bioinformatics and Computational Biology

Search WWH ::

Custom Search

Home