Geoscience Reference
In-Depth Information
the nature of the catchment processes and it is not clear whether we have really reached this stage, given
the limitations of current physically based process descriptions in really representing the nonlinearities
of hydrological processes at the scales of hillslopes and catchments and the difficulties in estimating
effective values of the model parameters (see Chapter 9).
12.3 Models as Hypotheses
In testing models as hypotheses, these types of uncertainty in the modelling process mean that there is
always the possibility of accepting a poor model when it should be rejected (false positive or Type I
error) or rejecting a good model when it should be accepted (false negative or Type II error). In statistical
hypothesis testing, we generally choose to test a hypothesis only with a certain probability of being wrong
(e.g. at the 5% level). It is difficult to carry this over to simulation models applied to places and data sets
that are unique in space and time with only single realisations of any observational errors (see Beven,
1981). Except in special circumstances, we cannot replicate sets of observations so any such 5% error
criterion would have to be assessed in some different way.
Consider each of the possibilities for being wrong. We would wish to avoid Type I errors in testing
because the performance of models falsely retained for use in prediction might lead to false inferences
and poor decision making. So how to avoid Type I errors? The primary reason why a Type I error might
occur is because there is enough uncertainty in the inputs to the model and the observational data used
in evaluation that whatever performance indicator is used to make a decision about model acceptability,
it cannot differentiate between good and poor models. This might be because of a particular or peculiar
sequence of observational errors, including “rogue” observations. To avoid Type I errors, we need to be
careful about the commensurability of observed and model variables and try to ensure that only periods
of good quality data are used in calibration.
It is perhaps more important still to avoid Type II errors. If we do retain a poor model for use in
prediction, then hopefully further evaluation in the future might reveal that it gives poor predictions and
therefore our choice can be later refined. But we really would not want to eliminate a good model just
because of input errors. However, again because of particular or peculiar sequences of observational
errors, it may be difficult to differentiate between models that are good in prediction from those that
are not. This might also be the reason why a model that performs well in calibration does less well in
testing, purely because of different input error characteristics. Again, it is necessary to be careful about
commensurability of observed and model variables and ensure that only periods of good quality data are
used in calibration or validation.
So is there any possibility of distinguishing between Type I and Type II errors in this type of model
application? Some error is inevitable and almost certainly not reducible to statistical noise while the
characteristics of different sources of error in calibration might well be different from those in prediction.
It therefore remains difficult to avoid both Type I and Type II errors until there is a good case for rejection
when a model performs poorly but we have belief that the observational data are adequate. The conclusion
therefore is that we should use a rejectionist approach to hypothesis testing while trying to avoid Type II
errors by avoiding periods of low quality or “rogue” observations (see Section 7.17).
This then suggests that, to achieve some objectivity in hypothesis testing, the criteria for model rejec-
tion need to be set independently of any model run. One way of doing so is the “limits of acceptability”
approach suggested in Section 7.10. This is a form of possibilistic rather than probabilistic model evalu-
ation. To avoid rejecting a good model because of poor input data, the limits of acceptability should be
set so as to reflect the potential effects of input, boundary condition, and observation error and should be
set prior to running the model.
The set of models which consistently fail such limits of acceptability tests should be rejected as
hypotheses, even if that means rejecting all the models tried. This is, of course, a positive result because
Search WWH ::




Custom Search