Environmental Engineering Reference
In-Depth Information
2. Evaluation and Comparison of Multi-Model and EPS-Based
Ensembles
In the simulations the EPS of ECMWF was used to drive long-range atmospheric
dispersion models. It consisted of 51 members: one control run and 50 others for
which perturbed initial conditions are applied. For ETEX-1 we had three different
EPS datasets, corresponding to the analysis times made 64, 40 and 16 h before the
beginning of the release, which we denote shortly as −64, −40 and −16 meteoro-
logical datasets. As dispersion models we used operational model of our institutions
(MATCH, MEDIA, NAME, FLEXPART). For each model the representatives of
the set of simulations driven by 51 members of ECMWF-EPS were created as the
combinations of model results, namely 50th, 75th, 100th percentiles of the distribution
of model predicted values plus the averages. As multi-model ensemble we used 25
model results available in the ENSEMBLE system. These models are different
operational long-range atmospheric dispersion systems applied routinely by national
weather or environmental centers in case of the release of harmful volatile substances.
The models make use of the national weather forecasts but in this case re-analyzed
meteorological data were applied eventually with additional use of mesoscale weather
prediction models. The same ensembles i.e. 50th, 75th, 100th percentiles and the
average of models results were determined.
To evaluate all the created ensembles we apply a number of statistical indicators.
First we analyzed ROC (Receiver Operating Characteristic) graph where as an
event we defined the exceedance of the threshold 1 × 10 −10 g/m 3 . All the ensembles
had low false positive rates (all below 6% with the average 1.3%) but true positive
rate varies from 30% to 98% (with the average of 72%). This shows that in
general models produced only a little number of false alarms but in few cases the
predictions of true alarms was not very good. The comparison based on Euclidian
distance showed that the multi-model simulations performed equally well as the
best EPS-based simulations. It can be also seen that, simulations based on the
newest meteorological data (−16) performed better than those using older data
(however −40 based were not better than −64 ones). The best results were produced
by the 100% models which suggests that the models mostly under-predicted the
concentration. This was confirmed by investigating the Talagrand diagrams. We
calculated also accuracy which is generally very high (94-99%) and again both
type of ensembles produced similar scores (actually multi-model median had the
highest score).
For other evaluation we computed the maximum and the root square errors for
all ensemble datasets. Again multi-model median got the best scores for the global
root square error, but the average models produced similar results to the medians.
It can be observed that for global root square error averages of the EPS-based
ensembles have better scores than their medians, while for multi-model ensemble
the situation is opposite. These differences however, are small.
Search WWH ::




Custom Search