Geoscience Reference
In-Depth Information
MCMC algorithm, however, is a function of how rapidly and thoroughly it samples
the posterior space. As such, while constructing a Metropolis-Hastings algorithm
is simple, ensuring its efficiency is not. In this section, we outline several practical
issues encountered in adapting an MCMC algorithm to a new problem and highlight
a number of best practices and potential pitfalls along the way.
3.3.1
Choice and Tuning of Proposal Distribution
It is clear from the formulation of the Metropolis-Hastings update ( 3.5 ) that the
proposal distribution
plays an important role in the MCMC algorithm.
Proposals that result, on average, in large deviations from the current position will in
general lead to smaller Hastings ratio ( 3.5 ) and lower probability of acceptance, and
vice versa. A desirable property of any MCMC algorithm is that it mix thoroughly
and rapidly, not being confined to a small region of the state space. Because there
is often little knowledge of the shape of the posterior distribution, it is necessary
in practice to adjust the width (e.g., (co)variance) of the proposal distribution
to strike a balance between sampling rapidly enough to mix thoroughly (large
moves through the state space; large proposal width) and sampling finely enough
to resolve details of the probability distribution (small moves through the state
space; small proposal width). Because of this, much of the subtlety involved in
constructing a MCMC algorithm centers around (1) choice of a suitable proposal
distribution, and (2) tuning the distribution width. As mentioned above, choice of
a symmetric proposal distribution leads to
q.
x
; /
, which simplifies
the Metropolis-Hastings update. While there are variants of MCMC that use non-
symmetric proposal distributions (e.g., Langevin-Hastings MCMC; Roberts and
Rosenthal 1998 ), in all of the discussion that follows we will assume the use of a
symmetric proposal distribution. A common choice of proposal is Uniform, centered
on the current estimate. In this case, the tunable parameter is simply the width of
this Uniform distribution. The advantage of this is its simplicity, and indeed this
was the choice originally made by Metropolis et al. ( 1953 ). It is now common
to use a zero mean multivariate Normal as the proposal distribution. This has the
advantage of consistently producing moves of about one standard deviation, but
with finite probability of much larger or smaller moves as well, allowing the chain
to more easily move between regions of the space containing localized probability
maxima.
Once a suitable proposal distribution has been chosen the question naturally
arises as to how to successfully tune it to thoroughly and efficiently sample the
posterior distribution. In essence, the question is what makes one Markov chain
“better” than another? Desirable properties are: rapid exploration of the space, fast
convergence to the target distribution, and production of a thorough and accurate
sample; no regions of the state space containing probability mass are missed. It is
clear from ( 3.5 ) that if the width of the proposal distribution ( 3.4 ) is small, virtually
all proposed moves will be accepted, but the movements will be very small and
q.
x i ; x
/ D q. x
;
x i /
Search WWH ::




Custom Search