Markov Chain Monte Carlo Methods: Theory and Applications - Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications

Geoscience Reference

In-Depth Information

MCMC algorithm, however, is a function of how rapidly and thoroughly it samples

the posterior space. As such, while constructing a Metropolis-Hastings algorithm

is simple, ensuring its efficiency is not. In this section, we outline several practical

issues encountered in adapting an MCMC algorithm to a new problem and highlight

a number of best practices and potential pitfalls along the way.

3.3.1

Choice and Tuning of Proposal Distribution

It is clear from the formulation of the Metropolis-Hastings update ( 3.5 ) that the

proposal distribution

plays an important role in the MCMC algorithm.

Proposals that result, on average, in large deviations from the current position will in

general lead to smaller Hastings ratio ( 3.5 ) and lower probability of acceptance, and

vice versa. A desirable property of any MCMC algorithm is that it mix thoroughly

and rapidly, not being confined to a small region of the state space. Because there

is often little knowledge of the shape of the posterior distribution, it is necessary

in practice to adjust the width (e.g., (co)variance) of the proposal distribution

to strike a balance between sampling rapidly enough to mix thoroughly (large

moves through the state space; large proposal width) and sampling finely enough

to resolve details of the probability distribution (small moves through the state

space; small proposal width). Because of this, much of the subtlety involved in

constructing a MCMC algorithm centers around (1) choice of a suitable proposal

distribution, and (2) tuning the distribution width. As mentioned above, choice of

a symmetric proposal distribution leads to

q.

x

; /

, which simplifies

the Metropolis-Hastings update. While there are variants of MCMC that use non-

symmetric proposal distributions (e.g., Langevin-Hastings MCMC; Roberts and

Rosenthal 1998 ), in all of the discussion that follows we will assume the use of a

symmetric proposal distribution. A common choice of proposal is Uniform, centered

on the current estimate. In this case, the tunable parameter is simply the width of

this Uniform distribution. The advantage of this is its simplicity, and indeed this

was the choice originally made by Metropolis et al. ( 1953 ). It is now common

to use a zero mean multivariate Normal as the proposal distribution. This has the

advantage of consistently producing moves of about one standard deviation, but

with finite probability of much larger or smaller moves as well, allowing the chain

to more easily move between regions of the space containing localized probability

maxima.

Once a suitable proposal distribution has been chosen the question naturally

arises as to how to successfully tune it to thoroughly and efficiently sample the

posterior distribution. In essence, the question is what makes one Markov chain

“better” than another? Desirable properties are: rapid exploration of the space, fast

convergence to the target distribution, and production of a thorough and accurate

sample; no regions of the state space containing probability mass are missed. It is

clear from ( 3.5 ) that if the width of the proposal distribution ( 3.4 ) is small, virtually

all proposed moves will be accepted, but the movements will be very small and

q.

x i ; x

/ D q. x

;

x i /

Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications

Search WWH ::

Custom Search

Home