Biomedical Engineering Reference
In-Depth Information
as the substitution probability space, the bayesian paradigm considers the prior
distributions
.T/
,
.t/
,and
.R/
, to model the a priori information on
T
,
,
and
,the
prior distributions are then combined with the likelihood function to provide a pos-
terior density function
R
, respectively. Selected an appropriate model of molecular evolution
M
that represents the probability distribution of
phylogenies conditional on the observed data matrix S , the model
B.T;
S
;M/
M
and the pri-
ors distributions
.T /
,
.t/
,and
.R/
. Maximizing
B.T;
S
;M/
is the goal of the
bayesian paradigm.
According to Bayes' theorem, fixed a phylogeny
R i the
corresponding subspaces of edge weights and substitution probabilities, the mathe-
matical expression of the posterior probability
T i
and denoted
t i
and
B.T i ;
;M/
T i
S
of
can be written as:
L R .T i ;
; M /.T i /
P T j 2 T L R .T j ;
S
B.T i ;
S
;M/ D
; M /.T j / ;
(8.17)
S
L R .T i ;
where
.T i /
denotes the prior probability of
T i ,and
S
;M/
denotes the
integral of the likelihood function
L.T i ;
S
;M/
over all possible edge weights and
substitution probabilities [ 41 ], i.e.,
Z
Z
; M /.t 0 /.R 0 /
t 0 d
R 0 :
L R .T i ;
S
;M/ D
R i L.T i ;
S
d
t i
Hence, finding the optimal solution for the bayesian paradigm means finding the
phylogeny
T i , the associated edge weights and the substitution probabilities that
globally maximize the posterior probability distribution of phylogenies
.
Since finding the maximum a posteriori phylogeny implicitly implies being able to
solve the likelihood paradigm, solving the bayesian paradigm is NP-hard [ 29 ].
The recursive nature of the likelihood function and the intractability of comput-
ing the denominator of Bayes' theorem prevent an analytical approach to solution of
the bayesian paradigm. Hence, the maximum a posteriori phylogeny is usually com-
puted by means of a Markov chain Monte Carlo (MCMC) algorithm [ 30 ], i.e., an
algorithm that samples
B.T;
S
;M/
B.T;
S
;M/
through a stochastic generation of phylogenies
in
is extremely time consuming; therefore,
the bayesian estimations may take even weeks [ 42 ]. However, as observed by Yang
[ 82 ] and Huelsenbeck et al. [ 41 , 43 ], the sampling process has also the indisputable
benefit of providing a measure of the reliability of the best-so-far solution found. In
fact, by sampling stochastically around the (best local) maximum a posteriori phy-
logeny
T
([ 49 , 52 , 83 ]). Sampling
B.T;
S
;M/
T , the bayesian paradigm could determine support values for the subtrees
T , i.e., measures of the posterior probability that the subtrees are true.
The bayesian paradigm is possibly the most complex among the phylogenetic
estimation paradigms currently available in the literature on molecular phyloge-
netics. The recent computational advances obtained by Ronquist and Huelsenbeck
[ 65 ] speeded up the execution of the MCMC algorithm and widened the use of the
bayesian paradigm. However, the lack of a systematic investigation of its statistical
of
Search WWH ::




Custom Search