Environmental Engineering Reference
In-Depth Information
The task of comparing the model output with independent datasets can be focused
into the substantial question of statistics, namely whether two samples (the result of
the model and the independent data) are derived from the same population. Evidently,
this is a further source of uncertainties and even errors in the qualitative assessment of
the model. Applying statistical tests to data can be viewed as modelling of the
underlying properties of their distributions. Therefore such testing implies an addi-
tional step of abstraction and reduction of information of data. In fact, what one does
is to model the independent data and extracting certain test statistics which are then
compared with the corresponding model of the data as generated by a model using
ecological data. The more links such a chain of data manipulation has, the more
caution is necessarily concerning the interpretation of their final results.
In applying statistical tests for model evaluation the user is prone to the usual
errors and pitfalls of statistics. Among those might be the risk to underrepresent the
actual shape of the data distribution with the test statistics and, in consequence, to
assume a common population of the samples. This phenomenon has been elaborately
demonstrated by Anscombe (1973), who has shown that critical qualities of distribu-
tions, like mean and variance of a dataset's x and y , the correlation between x and y ,as
well as the linear regression line can be identical in four vastly different datasets.
Another usual problem arises from datasets that violate fundamental assump-
tions of the applied statistical tests. For parametric tests this is mainly the assump-
tion of normality of data. Normality might be achieved by certain transformation
techniques, but sometimes one has to refer back to non-parametric tests, e.g. when
testing categorial data. Though there is no normality assumption for non-parametric
tests, there still are assumptions about data distribution that can be violated (see e.g.
Jopp and Lange 2007). Therefore, a generalized “assumption of no assumption” for
non-parametric statistics is not appropriate.
Among users of statistics there seems to be a firm belief in a p
5% for
significant test results. While this is far from being irrational, there are more
important lessons to be learned for the proper application of statistics, especially
regarding natural science data: Nothing in statistics is unquestionable, not even the
desire of a p
<
5% (Stoehr 1999). In contrast, it is far more important to always
refer back to the ecological sources of the data and to interpret them according to
the available biological knowledge. There are examples where using any statistics
at all will lead you to the wrong conclusions, because statistics is inherently
ignorant of the involved scientific disciplines. This is the reason why, in some
cases, alternative approaches, like structural model validation, might be more
appropriate than a straight statistical validation.
<
Structural Model Validation
Structural validation investigates in how far the model mechanisms reproduce the
proposed characteristics of the studied ecological context as described by the
conceptual model. Thus the model should not only reproduce the observed system
behaviour but also reflect the causal mechanisms and processes in which the real
Search WWH ::




Custom Search