Environmental Engineering Reference
In-Depth Information
Statistical Criteria. If the model output variable is
normally distributed, then the agreement between
measured and predicted values can be evaluated
using classical statistical measures, such as the
mean and standard deviation. For lognormally dis-
tributed variables, the log-transform should be
used prior to estimating the mean and the vari-
ance. For variables that are not normally or log-
normally distributed, the measured and predicted
values should be compared using nonparametric
techniques, such as the median, range, interquar-
tile range, and median absolute deviation.
regression as a criteria of model performance, the
slope, intercept, and r -value from the regression
line should be evaluated together before making
any conclusions about model accuracy.
Residual Error Analysis. Measures used in residual
error analysis include maximum error (ME), nor-
malized root mean square error (RMSE), coeffi-
cient of determination (CD), modeling efficiency
(EF), and the coefficient of residual mass (CRM).
When model predictions exactly match measured
values, ME, RMSE, and CRM are all zero and the
CD and EF are unity. These goodness-of-fit mea-
sures are appropriate for use on moderate to large
size data sets that obey normality, and which are
free from outliers (Mulla and Addiscott, 1999).
Nonparametric alternatives are available for the
RMSE, CD, and EF (Zacharias et al., 1996). There
is no consensus on what critical values for these
statistical criteria should be established to define
unacceptable levels of model accuracy.
Graphical Comparisons. Statistical criteria for evalu-
ating model predictions are of limited usefulness
for small data sets, for moderate size data sets
with few outliers, and for data sets in which
multiple observations are made at several suc-
cessive times. In these cases, model performance
may be evaluated using graphical comparisons
of observed versus predicted values. A variety of
graphical comparisons are possible, depending on
the type of model output. These include the
observed versus predicted mean values at differ-
ent times, the observed versus predicted median
values with error bars for the interquartile range
at different times, and observed versus predicted
cumulative distribution functions or exceedance
probabilities. When making graphical compari-
sons, it is recommended that statistical and
goodness-of-fit performance measures be pre-
sented on the graph.
Hypothesis Testing. Typically, the null hypothesis is
that there is no difference between measured and
predicted values, and a variety of statistical tests
are available to test the null hypothesis. For nor-
mally distributed variables, these include the two-
sample paired t -test, the factor-of-f f test, or the
Kolmogorov-Smirnov test (Snedecor and Cochran,
1980). For nonnormally distributed variables, the
nonparametric Wilcoxon rank sum test (Hollander
and Wolfe, 1973) may be used.
Linear Regression. One of the simplest, but least
rigorous, approaches for model calibration and
validation is the use of ordinary least squares
regression of model predictions versus experimen-
tal observations. There are three features of the
resulting statistical analysis that are important: the
slope of the regression line should be near unity,
the intercept of the regression line should be near
zero, and the correlation coefficient ( r ) should be
significantly different from zero and preferably
near unity. Together, the values for the intercept
and slope of the regression line are an indication
of the bias in the model predictions. For example,
a slope much greater than unity and an intercept
much less than zero indicates that the model under
predicts small values and over predicts large values.
A t -test can be used to determine whether the
slope of the regression line is significantly differ-
ent from unity. Linear regression analysis is often
misused, and one of the most common misuses is
to evaluate the accuracy of the model based only
on information for the value of the correlation
coefficient ( r ). The correlation coefficient by itself
is an almost meaningless criteria of model accuracy.
For example, consider the case in which the regres-
sion line has a slope of zero and an r value of
0.9. Although the r value is large, the model has
no predictive ability over a wide range of condi-
tions. This lack of predictive ability may be due
to inaccurate representation of the dominant trans-
port processes by the model. When using linear
11.3.2.1  Error Statistics.  Most models used in water-
resources modeling can be expressed in the generic
form
Y
= f (
X
,
q
(11.7)
where Y = { y n ; n = 1 . . . N } is the response matrix con-
taining directly observable quantities at a series of times
t = { t n ; n = 1 . . . N }, X = { x n ; n = 1 . . . N } is the forcing
input matrix containing forcing variables at the same
series of times, and θ = { θ n ; n = 1 . . . P } are the model
parameters. Model parameters, θ , are either physical or
conceptual. Physical parameters are those parameters
Search WWH ::




Custom Search