Geoscience Reference
In-Depth Information
As a consequence, the initial set of points in the chain is not representative of the
posterior PDF. In contrast, the chain initiated within a region of relatively high
probability (Fig. 3.2 c) immediately begins sampling regions with relatively large
probability, however, the slow rate of mixing leads to inefficient sampling. When
the proposal variance is allowed to adapt (Fig. 3.2 d), the chain rapidly converges
to efficient posterior sampling, though it is clear that the first portion of the chain
is still not representative. This example illustrates the fact that, even with a poorly
chosen start point, effective tuning of the proposal distribution can produce a robust
sample.
To eliminate dependence of the posterior PDF on the start point, and to ensure
posterior samples are representative of the target distribution, it is common practice
to discard a portion of the beginning of the Markov chain. However, there is
disagreement as to how much of the chain should in general be thrown out.
Some authors suggest running a chain until convergence has been determined, then
discarding the first half of the chain ( Gelman et al. 2004 ). Others note that, if a
suitable starting point is chosen, then there is no need to discard any samples at
all ( Haario et al. 1999 ; Geyer 2011 ). In practice, it is difficult to know a priori
whether the chosen starting point lies in a region of sufficient probability density.
Diagnostic examination of the properties of the chain after a suitable number of
iterations typically reveals how many samples should be discarded. In general,
the Markov chain can be said to have “forgotten” the initial position once the
lagged autocorrelation between a given point in the chain and the starting point is
sufficiently close to zero. The resulting number of samples constitutes the minimum
number that should be discarded. Comparison of the likelihood (
)ofthe
posterior mean with likelihood values near the beginning of the chain often reveals
which values near the start of the chain are associated with very low probability and
should be removed. Note that we are drawing a distinction between the practice of
discarding a number of initial samples and the practice of so-called burn-in, which
is often conflated with tuning of the proposal distribution.
P.
y j x
/
3.3.3
Single Versus Multiple Chains
Multi-core computing has become common in most research environments, and it
is now standard practice to run multiple Markov chains in parallel ( Gelman et al.
2004 ). The motivation behind doing this is primarily computational efficiency; once
each chain has converged to sampling the target distribution, samples from all chains
can be combined together and the sample size greatly increased in the process. In
theory, this practice can be effective, provided each chain is constructed as carefully
as would be done with a single chain. There are, however, a number of potential
pitfalls. The first is the temptation to replace a single long chain with multiple short
chains, with the goal of obtaining the same sample size in a shorter period of time.
This has been used to great utility in computationally demanding problems, and for
applications that require rapid solutions (e.g., Delle Monache et al. 2008 ). However,
Search WWH ::




Custom Search