Geoscience Reference
In-Depth Information
not well behaved in this statistical sense. This is most obvious in the way that models that
have very similar error variance can have orders of magnitude difference in likelihood when
Equations (B7.1.5) or (B7.1.6) are applied to large numbers of residual errors. The problem then
is what type of likelihood measure might more properly reflect the real information content
of the conditioning data for these non-ideal cases (Beven, 2006a). The GLUE methodology,
for example, has been used with a variety of informal likelihood measures, such as the Nash-
Sutcliffe efficiency measure or the inverse exponential variance measure used in the case study
in Section 7.11.
Smith et al. (2008a) have made an analysis of a number of informal likelihood measures,
in particular the extent to which an informal likelihood can be consistent with the axioms
of probability as an expression of belief in a particular model realisation. Such belief should
reflect how far the simulated variables are unbiased with respect to the observations, rise and
fall with a similar pattern to the observations, and show similar variability as the observations.
Note that expressing belief in this way is not inconsistent with the original Bayes (1763) concept
of taking account of the evidence in assessing the odds for belief in a particular hypothesis.
The essential requirements of such a measure is that it should increase monotonically as
the fit to the observations improves (though the goodness of fit is usually defined with respect
to the particular measure being used). Within the GLUE framework, the likelihood can also
be set to zero for models that are rejected as nonbehavioural , which might involve defining
some additional threshold conditions as local (for each single observation) or global (over all
observations) limits of acceptability.
Concentrating for the moment on the Nash-Sutcliffe efficiency measure (Equation (7.3)),
Smith et al. show how this can be thought of as the combination of three positive components
in the form:
for
2
¯
s
o
o
¯
s
2
E =
E>
0
(B7.1.8)
o
where
is the linear correlation coefficient between the observed
o
and simulated
s
variables
with means ¯
s respectively.
The first term summarises the ability to reproduce the pattern of the data through the linear
correlation. The second and third terms penalise differences in the variance and mean of the
two data series. The difference in the mean is penalised relative to the standard deviation of
the observed series. The second term penalises deviations of the ratio of standard deviations
away from
o
and ¯
s
and variances
o and
, rather than 1 which would be desirable. The second term combined with the
first term indicates that a positive efficiency value is only possible when
is positive. This
type of decomposition provides the basis for evaluating different measures with respect to the
axioms of probability. It is shown that the efficiency measure (when scaled such that the sum
over all the ensemble of models considered is unity) does satisfy the axioms, whereas some
other objective functions, such as Willmott's Index of Agreement or, more importantly, a total
volume error measure, do not (see the work of Smith et al. (2008a) for details and Guinot et al.
(2011) for an argument in favour of using the balance error as a model evaluation criterion).
They also show, in a hypothetical example for which the true results are known, that such
informal measures do not produce the correct conditioning of posterior parameter distribu-
tions for residuals with a consistent bias. In this, however, they are no worse than a formal
likelihood that does not explicitly account for the bias (e.g. Equation (B7.1.5)), reinforcing the
fact that the assumptions of a formal likelihood measure should always be checked against the
characteristics of the actual model residuals.
A further feature of informal likelihood measures considered was the way in which they
continue to condition the posterior parameter distributions as more data are added. This was
the essence of the criticism of the use of the Nash-Sutcliffe efficiency in GLUE by Mantovan
and Todini (2006). When used as a global measure, the efficiency will asymptotically reduce
the rate of conditioning as new data are added. However, as pointed out by Beven et al. (2008)
this is a choice. An alternative choice would be to use the efficiency as a common sense
Search WWH ::




Custom Search