Geoscience Reference
In-Depth Information
of the Popperian asymmetry of learning by falsification: i.e. we learn more from falsification of a hy-
pothesis than from one that cannot yet be rejected on the basis of the available observations (although
Popper did allow that there might be degrees of verisimilitude and falsification). If a model as hypoth-
esis satisfies some (relaxed) limits of acceptability then there is no guarantee that it is getting the right
results for the right reasons. If it is not, then it may still fail in prediction (although such models are
necessarily still our best bet in making predictions). We can only continue to test for rejection as new data
become available.
On the other hand, if a model fails the limits of acceptability (and we are happy that we have taken
adequate account of uncertainties in the input data so as to avoid Type II errors) then we have eliminated a
model from the prediction process. If all the models tried fail such a test, then there might be information
in the periods or types of failure that would suggest how a model might be improved, resulting in a
positive learning process, albeit with work still to be done.
Given these (rather serious) issues, the problem is how to define when a model is acceptable, conditional
on some imperfect evaluation data. This might be different for scientific research and for practical
applications. For scientific research, hypothesis testing (ideally) should be posed in terms of falsifiability
as discussed above. But hydrology is also a practical science and, for practical applications, what is
needed of a model is that it should provide predictions of future outcomes that lead to good management
decisions (or, at least, do not lead to poor management decisions). Any conditioning or evaluation process
given some imperfect data is only a means to that end. We want to avoid Type I errors but the potential
for epistemic errors means that there will be limitations as to how far success in prediction can be
guaranteed by success in calibration or conditioning. Such errors are a form of knowledge uncertainty
(and, when referring to errors that could not be known beforehand, are sometimes called Type III errors).
The assumption that error characteristics in prediction are the same as in calibration is a convenience
when we actually expect nonstationarity of sources of uncertainty.
These are some of the issues that underlie the testing of models as hypotheses about system functioning.
The dialogue between experimentalists and modellers now needs to proceed to the stage of agreeing what
might constitute appropriate limits of acceptability or other principles on which adequate hypothesis tests
might be based. A key question, therefore, is just how we can define an adequate hypothesis test that
would, in fact, identify models that were getting the right result for the right reasons (Klemes, 2000;
Kirchner, 2006; Beven, 2010) or, indeed, pose the right sorts of question (see Sivapalan, 2009). We have
mostly been trained to consider that hypothesis testing should normally involve a statistical analysis and
there is certainly no shortage of statistical techniques for model choice and model testing. The use of
such methods, however, implies that the sources of uncertainty that affect model performance can be
considered as if they are random. The problem in hydrological modelling is that there are uncertainties
arising from lack of knowledge or understanding that it might be misleading to treat as if they were
random. In Chapter 7, these two types of error were introduced as aleatory and epistemic errors.
The recognition of epistemic errors has two important implications. One is that it cannot be assured that
the nature of errors in prediction are the same as in calibration. Of course, we have no idea what the nature
of such errors might be in prediction so it is usually only possible to assume that they are, in some sense,
similar to the calibration period. But it does have a second implication in that the value of the information
in calibration might be less than that implied by calculating the type of formal statistical likelihoods
shown in Box 7.1. In particular, some residual errors in calibration might be disinformative about what
constitutes a good hypothesis of the system and consequently introduce bias into any calibration process
(see, for example, Beven et al. , 2008; Beven and Westerberg, 2011; Section 7.17).
It may not be sufficient to test model performance purely on the basis of the output observations from
a catchment system. It would be more rigorous to also test whether the model can predict the changing
internal states of the catchment adequately. This raises three further problems in hypothesis testing. The
first is the possibility of incommensurability between observed state variables at a certain scale in space
and time and the equivalent model state variables, often predicted at different space and time scales
Search WWH ::




Custom Search