Phylogenetic Parameter Estimation on COWs - Parallel Computing for Bioinformatics and Computational Biology

Biomedical Engineering Reference

In-Depth Information

where the runtime with one processor is typically measured with the corresponding

sequential programmissing the overhead of a parallel version. The success of a parallel

implementation often depends on the granularity, that is, the amount of independent

problems that are executed by the processors until communication is necessary, as

well as the way the workload is distributed among the available processors.

In the following, the parameter estimation part of the TREE-PUZZLE program

[12] serves as an example to demonstrate the impact of task granularity and the

scheduling algorithm on the parallel performance. TREE-PUZZLE implements a

maximum likelihood (ML) based method, quartet puzzling [13], to reconstruct evo-

lutionary trees from (biological) sequences. To infer ML based trees, it is necessary to

estimate the model parameters by ML method. In sequential programs, this is a time-

consuming task and several heuristics are employed [7, 14, 15]. The parallelization

of the sequential ML based optimization technique is nontrivial, when developing

parallel phylogenetic analysis software.

In the following, we will examine the parallel version of TREE-PUZZLE [12]

that uses MPI (Message Passing Interface [16, 17]) for parallel execution on a

heterogeneous and a homogeneous COW.

15.2 PHYLOGENETIC TREE RECONSTRUCTION USING QUARTET

PUZZLING AND ITS PARALLELIZATION

15.2.1 Modeling Molecular Evolution

Here, we give a short introduction to the probabilistic framework that is used in phylo-

genetic analysis. In the following, we confine ourselves to DNA sequence evolution.

However, we note that everything transforms easily to model protein sequence evo-

lution. As input serves a DNA sequence alignment (cf. Table 15.1) of n sequences

S 1 , ... , S n with length L (measured in nucleotides or base pairs, bp). All characters

within one column trace back to one common ancestor nucleotide.

The ML reconstruction of a tree with n sequences requires a model of evolu-

tion describing how substitutions from one character to another occur at a given

position in a sequence alignment A (cf. Table 15.1). Typically, this is modeled by

time-homogeneous reversible-time Markov process [18, 19]. This stochastic process

TABLE 15.1 Alignment Example for Eight DNA Sequences S 1 , ... , S 8 1

S 1 …TGT C T CA A G GA C TAAGCCAT GCA A GT GT AA GTAT A A GTAT …

S 2 …TGT C T AT A G GA C TAAGCCAT GCA A GT GC AA GTAT G A GTGA …

S 3 …TGT C T CA A A GA T TAAGCCAT GCA T GT CT AA GTAT A A ACAT …

S 4 …TGT C T CA A G GA C TAAGCCAT GCA A GT GT AA GTAT G A GTGA …

S 5 …TGT T T AA A G GA C TAAGCCAT GCA A GT GC AA GTAT G A GTGA …

S 6 …TGT C T CA A A GA C TAAGCCAT GCA T GT CT AA GTAT A A ACAT …

S 7 …TGT C T CA A A GA T TAAGCCAT GCA T GT CT AA GTAT A A ACGT …

S 8 …TGT C T CA A A GA C TAAGCCAT GCA A GT CT AA GTAT A A GTGT …

1 Columns in boldface contain substitutions

Search WWH ::

Custom Search

Home