Environmental Engineering Reference
In-Depth Information
The forms of the external and internal model dis-
crepancy traces are very different, as they are measuring
different things. The light grey trace, which shows the
sum of the total model discrepancy and the measurement
error, is of primary interest in assessing model adequacy.
Even though it is large in places, it is much smaller
than the overall range of both the observed and model
discharge, suggesting that that the model is mostly ade-
quate for describing discharge, except during periods of
heavy rainfall, where the light grey trace spikes, which is
particularly evident in the residual plot, displayed in the
lower panel.
x (1) ,
, x (17)
...
and, in the first instance, we choose
x ( j )
1.
3. We then use the step function in R to carry out a
backward step-wise selection procedure to identify a
subset of active inputs x a of the inputs x that account
for a high percentage of the total variation in the
logarithm of model discharge in relation to the fit-
ted model. A further reduction of the subset can be
achieved by removing statistically significant inputs
which otherwise have little practical impact on model
output. For simplicity, we kept the same number of
active inputs for each output, and found that 12 inputs
were sufficient, although a different 12 were chosen
for each of the 13 outputs.
4. We then fit a quadratic in the active inputs determined
in (iii); that is, with the g (
g j ( x )
=
for j
=
1,
...
,17and g 0 ( x )
=
26.4 Slow computer models
·
) in Equation 26.12 of the
x ( i a x ( j )
The above analysis was based on an ability to modify
the computer code and to carry out very many evalu-
ations of the model. We now describe how to modify
our analysis when neither of these conditions applies. For
purposes of comparison, we reanalyse the runoff model
of Section 26.3 but we now suppose that we have no
access to the computer code and that the runoff model
has a long run time. Therefore, we used only 250 carefully
chosen training runs with which to build an emulator
of the computer-code implementation of the model. As
discussed in Section 26.2, an emulator is a fast stochastic
approximation of the model. We can evaluate the expec-
tation and variance of the emulator: the former mimics
the behaviour of the model while the later represents our
uncertainty in the approximation (see, for example, Craig
et al ., 1997; Craig et al ., 2001; Kennedy and O'Hagan,
2001; O'Hagan, 2006 and MUCM, 2009).
To illustrate emulation, we consider the logarithm of
the discharge at each of the 13 equally spaced hours 100,
160, ... , 760, 820. The following procedure is used to
construct an emulator for the logarithm of discharge at
each of these 13 hours:
form g ij ( x a )
=
for 0
i
j , where g 00 ( x a )
=
1
a
x ( j a .
5. If the multiple R 2 for the fitted quadratic model is sub-
stantial, in excess of 90% say, then it should be a useful
predictor of model output at untried inputs. However,
as an emulator of the model, the quadratic regres-
sion fit will not agree with the model outputs at the
250 inputs. As explained after Equation 26.12, current
emulator research treats the residuals as a 'smooth'
random process instead of the 'rough' residuals from
the quadratic regression fit, acknowledging that the
model is likely to be a continuous, differentiable func-
tion of the inputs x . Thus, the emulator for a single
output f ( x ) of the runoff evaluated at x , has the form:
and g 0 j ( x a )
=
0 i j 12 x a ( i ) x a ( j )
f ( x )
=
β +
u ( x )
(26.12)
6. The actual emulator for the computer model at any
input x is obtained by assessing (a)
2 to equal the
residual mean square from the least squares fit to
Equation 26.12; (b) the
σ
β ij to equal to their least squares
estimates; and (c) the variances and covariances of the
β ij to equal their estimated values resulting from the
least-squares fit.
Furthermore, we usually decompose u ( x )tobeof
the form u ( x ) = f t ( x a ) + ν ( x ), where ν ( x ), called a
'nugget' residual, accounts for the absence of variation
due to the inactive inputs: two different inputs x and
x may have the same values for their active input
components. We assume that
1. We select a Latin hypercube of 250 points over the 17
functionally independent inputs and run the model
at each of them. To construct a Latin hypercube of n
points, the range of each of the inputs is divided into
n equal intervals, and the points are then chosen ran-
domly so that no two points occupy the same interval
for any of the inputs (see, for example, MUCM, 2009).
2. Next we fit a linearmodel of the formof Equation 26.12
to the logarithmof the 250 model discharges, using the
lm function in R (see the R Development Core Team,
2008). Each model input choice x has 17 components
ν
( x ) has zero expecta-
2
( x )
tion and variance
δσ
for all x ,and
ν
( x )and
ν
x when they are per-
fectly correl at ed: we take δ = 0 . 05. The other residual
component f t ( x a ) has zero expectation and variance
are uncorrelated, unless x
=
Search WWH ::




Custom Search