Geoscience Reference
In-Depth Information
According to Roberts and Rosenthal ( 2001 ), the optimal proposal covariance
update is
.2:38/ 2
d
p D
n
(3.9)
where, as above,
is the dimension of the system of interest. Note that the sub-
sample size n (e.g., the number of iterations used to update the covariance) and the
frequency of update m are not necessarily equal, and reasonable choices (assuming
a Gaussian target distribution) are documented in Haario et al. ( 1999 ). Once the
acceptance rate converges to something between 10 and 60 %, then the proposal
covariance is held fixed and the algorithm is allowed to run freely. It is standard
practice to discard the iterations used in adaptively tuning the proposal covariance. A
note of caution is merited here: for highly complex parameter covariances and large
dimensions a very large number of samples may be required to robustly estimate
the posterior covariance matrix. The author has found that, in cases involving
a complicated posterior structure and infrequent covariance update, the wrong
covariance may be specified, leading to inefficient sampling. In practice, a safe
choice is to update the proposal variances only (neglecting any information on the
parameter covariances), though this may result in a slightly less efficient algorithm.
It should also be noted that the choice of starting point may also influence the
effectiveness of parameter tuning. For example, imagine the chain starts in a very
low-probability portion of the space within which the gradient in probability mass
is also small. In this case, each proposed point will have very similar likelihood, the
Hastings ratio will always be close to 1, and nearly all moves will be accepted.
It follows that, because of the large acceptance fraction, the proposal variance
will become tuned too large. Just as importantly, the sample covariances will be
unrepresentative of those in the true posterior PDF.
d
3.3.2
The Initial Sample
There are two issues that must be considered when initiating the Markov chain at the
core of the MCMC algorithm: (1) the characteristics of the posterior sample should
not be sensitive to the values of the state variables at the start of the chain, and (2) it
is desirable to start the chain in a region that contains relatively large probability
mass. This is not only because these are the regions the sampling algorithm is
designed to characterize, but also because it is not desirable to include samples
in the chain that are associated with very low probability. This problem can be
illustrated by considering a tutorial example in which two parameters are estimated
from two observations. A random collection of 20,000 points drawn from a posterior
sample generated with a high-probability start point and well-tuned proposal is
shown in Fig. 3.2 a. Three experiments are conducted. In the first, the start point
is located in a high probability region; in the second, the start point is ten standard
deviations outside the mean; and in the third, the start point is ten standard deviations
Search WWH ::




Custom Search