Geoscience Reference
In-Depth Information
accepted moves to proposed moves should neither be close to 0 or close to 1. If
the target distribution is multivariate Gaussian, then the optimal acceptance rate
is 0.23 ( Gelman et al. 2004 ) and a value of approximately 20 % has been shown
to work well for non-Gaussian posterior PDFs as well ( Geyer and Thompson
1995 ). Roberts et al. ( 1997 ) determined that, in the limit of large dimensional state
space and for posterior distribution in which each variable is i.i.d., the optimal
acceptance rate is precisely 23.4 %. Roberts and Rosenthal ( 2001 ) showed that the
optimal multivariate Normal proposal covariance
p should be proportional to the
covariance
of the target distribution;
p D k†
. Given a multivariate Normal
target density with multivariate covariance
and dimension
d
, the optimal Normal
proposal covariance is
.2:38/ 2
d
p D
†:
(3.8)
is typically not known a priori and (2) there
is no guarantee the target distribution is multivariate Normal. Though care must be
taken not to blindly tune to the “optimal” acceptance rate, the theory laid out in
Roberts et al. ( 1997 )and Roberts and Rosenthal ( 2001 ) serves as a useful starting
point.
In practice, the following procedure has proven to work well for most problems:
Of course, the difficulty is that (1)
(i) Run a pilot MCMC chain that generates an ensemble of realizations of
P.
y j x
/
and compute an approximate
, assuming the posterior is multivariate Normal.
p from ( 3.8 ) above.
(iii) Monitor the acceptance rate in the early stages of the algorithm and ensure it
stabilizes between 10 and 60 %.
(ii) Construct an initial
If the target distribution is far from multivariate Gaussian, then the result will still
be slow mixing, but the chain will be more efficient than if left un-tuned.
The question remains: in the absence of knowledge of the true covariance of
the target distribution, how can one optimally choose the proposal covariance?
Adaptive algorithms are the most commonly used solution to this problem and
are nearly uniformly used during a period that is referred to by most authors as
“burn-in”, though the term burn-in is rather confusing as it has been applied both
to the practice of proposal tuning and rejection of the initial portion of the chain
(see Sect. 3.3.2 below). Increasingly, adaptive algorithms are used over the length of
the chain (e.g., Haario et al. 2001 , 2006 ; Roberts and Rosenthal 2007 , 2009 ; Vrugt
et al. 2009 ; Vrugt and Ter Braak 2011 ), but a full discussion of this topic is beyond
the scope of this paper. Adaptive proposal tuning is typically done as follows. The
user first selects a proposal covariance (typically diagonal) under the assumption
that the individual parameter proposal variances are proportional to the realistic
range of values of each. A starting point for the chain is selected and Metropolis-
Hastings sampling commences. After a set of n iterations, the sample covariance
n is computed and the proposal covariance matrix is updated using these values.
The new proposal covariance is then held fixed for the next m iterations, after which
the most recent set of n values is used to produce an updated proposal covariance.
Search WWH ::




Custom Search