Geoscience Reference
In-Depth Information
so that they are considered independently of any model residuals (we say “ideally” because it is difficult
to make an assessment of the potential impact of input errors independently of the model). It is therefore
a rejectionist approach to model evaluation such that, for any observation in space and time, O ( x, t ), a
model with parameter set θ , is only retained as behavioural if:
O min ( x, t ) M ( θ, x, t ) O max ( x, t )
(7.5)
where M ( θ, x, t ) is the model prediction. Within that range, the performance of the model can also be
scaled so that observations of quite different characteristics can be assessed in a similar manner. To retain
information about whether the model is over- or under-predicting, a normalised score in the range 1to
+ 1 can be used, with 0 at the value of the observation. The limits do not need to be symmetric on either
side of the observation since, given the limits defined for that observation, the prediction can always be
scaled back from the score.
The scores can also be used to construct a performance or likelihood measure within the acceptable
range (see Box 7.1). The simplest index of performance would be the triangular function from zero at the
upper and lower limits to one at the observed value. As with analogous fuzzy measures, however, other
functions might be suitable, such as a trapezoidal measure allowing for an equally likely range around
the observation (Beven, 2006a). Different types of observation and soft information about performance
is also easily incorporated into this approach (see, for example, the work of Blazkova and Beven, 2009).
An example application is developed in Section 7.12. Other earlier GLUE studies have effectively used
limits of acceptability for individual observations of this type in model evaluation (e.g. Iorgulescu et al. ,
2005, 2007; Pappenberger et al. , 2006b, 2007a; Page et al. , 2007).
A rejectionist approach of this type will reveal periods where models do not provide adequate predic-
tions. This might be because of limitations of the model structure or driving the model with inadequate
input data. We do not want to make the error of rejecting a good model because of poor data. Thus the
limits of acceptability approach also focuses attention on the quality of calibration data, some of which
might not be informative in deciding on which models are good hypotheses about catchment response.
7.10.5 Updating Likelihood Measures
If more than one period of data is available for evaluating the model, or if new data become available, then
the likelihood measures from each period can be combined in a number of different ways, as shown in
Box 7.2. This can be viewed as an updating procedure. At each stage, including after the first period, there
is a prior likelihood associated with each parameter set that is combined with the value of the likelihood
measure for the period being used for evaluation to calculate a posterior value. Bayes equation is one
way of doing such calculations that is well known in statistical theory but it is not the only one (Box 7.2).
The posterior from one period then becomes the prior for the next application. The likelihood measures
for a given parameter set for the periods may be correlated; indeed, one should hope it is the case that if
a model performs well in one calibration period, it will continue to perform well in other periods. If this
is not the case then its combined likelihood measure will be reduced.
It is possible that, in combining two measures from different observed variables during the same
calibration period, there will be a correlation in model performance against different variables, i.e. a
model that produces good simulations of an output variable might equally produce good simulations
of an internal state variable (although it has to be said that this does not necessarily follow in many
environmental models). If a model produces good simulations on both variables, its relative likelihood
is raised; if it does not, its relative likelihood is lowered.
The choice of method of combining likelihood measures may have implications for the choice of the
measure itself, in particular if it is required that multiple combinations (for example, of measures from
different periods of data) have the same result as treating the data as a single continuous period (where this
Search WWH ::




Custom Search