Geoscience Reference
In-Depth Information
leading to a likelihood function for iid Gaussian errors of the form:
N
= 2
N/ 2
1
2
2
t
L
(
| M
(
, I
))
exp
(B7.1.5)
2
2
t =
1
There is a large body of likelihood function theory developed for different sets of assumptions.
Assuming, for example, that the errors have a mean bias and are correlated in time yields a
likelihood function of the form:
(1
) 2
2
N/ 2 (1
N
t
1
2
2 ) 1 / 2
2 )(
) 2
L
(
| M
(
, I
))
=
exp
1
+
(
t −1
2
2
t =
2
(B7.1.6)
N/ 2 . When the calibration
data are based on time series of observations, the value of N can be very large. This means
that a formal likelihood of this type will take models of similar error variance and stretch
the difference between them. The effect is reduced by the other bias and correlation terms
in Equation (B7.1.6) but can still involve small differences in error variance being stretched
to orders of magnitude differences in likelihood. In fact, the calculations are normally made
in log likelihood transformation to avoid small likelihoods being lost in computer rounding
error. The result is to produce a highly peaked likelihood surface and focus attention on the
models with the highest likelihood (even if that model might have only a very slightly smaller
error variance than many other models). Such a stretching of the surface is a consequence of
assuming that every residual contributes to the overall likelihood in a multiplicative (but not
zero) way (as in the product of Equation (B7.1.5)).
The question is whether such a stretching is justified in real cases? Why should models with
only slight differences in residual variance have orders of magnitude difference in likelihood?
That does not seem reasonable (even if consistent with the theory) when we know little of
the various sources of error in the modelling process. The sensitivity of the stretching does
mean that different realisations of error in the calibration data (such as provided by different
calibration periods) might result in peaks in the surface being in quite different places. Indeed,
even in hypothetical cases where we can ensure that the model is correct, that we know the
true parameter distributions, and the assumptions about the structure of the error are correct,
different error realisations can result in biased parameter estimates using a statistical likelihood
of this type (see the work of Kuczera, 1983, and Beven et al. , 2008a). In the hypothetical
case, the addition of more observations should result in convergence on the true parameter
distributions. In real applications, we cannot be sure of the same asymptotic behaviour.
Another common feature of actual residual series is that they are heteroscedastic . This means
that the variance of the errors changes with the magnitude of the prediction (in rainfall-runoff
modelling, it is common for the residuals at higher discharges to be greater for a variety of
reasons). It is possible to make some assumption about the nature of the dependence of residual
variance in deriving a likelihood function (e.g. Schoups and Vrugt, 2010), but a more common
strategy is to transform the residuals such that they have a more constant variance and are
more nearly Gaussian in distribution. The most common transformation used is the Box-Cox
transformation (Box and Cox, 1964). This is defined as:
t = t
Note that in Equations (B7.1.5) and (B7.1.6) there is a term in
2
1
for
=
0
=
ln
t
for
=
0
(B7.1.7)
where
is a constant chosen so as to
minimise the skewness of the transformed values. Note that if the original residual series
exhibits autocorrelation, the transformed residuals are autocorrelated.
t
is the transformed values of the variable
t
and
 
Search WWH ::




Custom Search