Geoscience Reference
In-Depth Information
B7.3.2 Simple Monte Carlo Sampling
The sampling problem is much more difficult when we do not know the shape of the likelihood
surface before starting sampling. There may also be little prior information to guide the search,
since it can be difficult to estimate beforehand what effective values of the parameters (and
their potential interactions) might be needed to get good results from a model in matching the
available observations. It is quite common, for example, to only estimate some prior upper
and lower limits for particular parameters, without any real idea of what distribution to assume
between those limits.
Without such prior information there is initially little to guide the sampling strategy. So a
rather obvious choice of strategy is to use simple Monte Carlo sampling, in which random
values of each parameter are chosen independently across the specified ranges. Where prior
information is available, this is easily modified to samples of equal probability by sampling
across the cumulative probability range; this will result in a sampling density proportional to
the prior probability. The only problem with this very simple strategy is that of taking enough
samples. Similar issues arise as for the forward uncertainty estimation problem, except that
now we do not know where to concentrate the search. If only a small number of samples are
taken, areas of higher likelihood on the surface might be missed. The method is therefore only
useful where model runs that might be retained as behavioural (higher likelihood) are spread
through the parameter space. Otherwise, very large numbers of samples might be required to
define the shape of local areas of high likelihood. However, it is the commonly the case where
non-statistical likelihood measures are used within the GLUE methodology and many GLUE
applications have used this type of simple Monte Carlo sampling. Refinement of the simple
sampling strategy is also possible by discretising the space into areas where high likelihoods
and low likelihoods have been found by some initial search and then concentrating sampling
in the sub-spaces of high likelihood. A variety of methods are described by Beven and Binley
(1992), Spear et al. (1994), Shorter and Rabitz (1997), Bardossy and Singh (2008) and Tonkin
and Doherty (2009).
B.7.3.3 Importance Sampling: Monte Carlo Markov Chain
As noted elsewhere, statistical likelihood functions tend to stretch the response surface greatly,
resulting in one or more areas of high likelihood that are highly localised. This means that
simple Monte Carlo search algorithms would be highly inefficient for such cases and a more
directed strategy is needed to define the shape of the surface with any degree of detail. Most
strategies of this type are adaptive in the sense of using past samples to guide the choice of
new samples and have the aim of finishing with a set of samples that are distributed in the
parameter space with a density that is directly proportional to the local likelihood. This is a
form of importance sampling. The most widely used techniques for importance sampling in
hydrological modelling are those of the Monte Carlo Markov Chain (MC 2 ) family. They have
been used in rainfall-runoff modelling at least since Kuczera and Parent (1998).
The concept that underlies MC 2 sampling is quite simple to understand (see also Beven,
2009). The scheme starts with a set of random samples chosen according to some proposal
scheme. Each chosen point represents a parameter set. The model is run with that parameter
set and a posterior likelihood for that point is calculated. A new set of points around that
point is then chosen, consistent with the proposal distribution. Whether a model run is made
at the new point depends on the likelihood of the original point and a random number, so
that there is a probability of making a run even if the likelihood of the original point was low.
This is to guard against not sampling regions of the space where a new high likelihood area
might be found. Once the sampling is complete, the chain is checked to see if the sample of
points is converging on a consistent posterior distribution. If not, another iteration is carried
out, which might involve adapting the proposal distribution to refine the sampling. The process
is effectively a chain of random walks across the likelihood surface, where the probability of
 
Search WWH ::




Custom Search