Geoscience Reference
In-Depth Information
appear reasonable in terms of goodness-of-fit statistics, it has been shown that not all of them will
necessarily make physical sense, that is, represent the conceptual underpinnings of the natural
system being examined (Beriro et al., 2012b, 2013). Depending on the problem being investigated,
the computational effort required to run multiple models can be time consuming, exceeding the
resources available for a project. If this is the case, then the reader is encouraged to consider PC
(Adnan et al., 2014) as an option rather than to omit this step from their experimental design as this
process can be used to simultaneously perform multiple runs at once.
Further details and more technical information on the mechanics of GP can be found in a variety
of sources (e.g. Ferreira, 2006a; Koza, 1990; Poli et al., 2008; Schmidt et al., 2011).
8.3.4 S tage 4 r ejecting and a ccePting M odelS
The decision to accept or reject a model is based on an evaluation of both its reasonableness and
its rationality. Reasonableness can be tested quantitatively by calculating the error exhibited by
a solution using goodness-of-fit metrics or by presenting scatterplots that depict model accuracy.
Rationality can be investigated using sensitivity analysis. What is advocated here is that rather than
choosing one tool to decide whether or not to accept or reject a model, multiple lines-of-evidence
should be obtained, gathered from a variety of sources, or acquired by means of different options.
Quantitative analysis of model accuracy is undertaken as a validation exercise using goodness-
of-fit metrics. You will find that the equations used for this calculation are often the same as the error
metrics used to select individual chromosomes for optimisation during model development. Their
application here, however, serves a different purpose, that is, one of model acceptance or rejection.
In any case, choosing which measure to use should be informed in part by the problem being exam-
ined. Some metrics like R -squared are dimensionless, while others, like RMSE, are expressed in
identical units to that of the dependent variable. When solutions represent non-linear problems, as
is often the case in GP investigations, R -squared may actually be an inappropriate metric (Kvalseth,
1985). Moreover, statistical measures deliver crisp numerical descriptors, while the acceptance cri-
teria adopted by a modeller is often somewhat more arbitrary and subjective. For example, some
researchers might consider an R -squared value of 0.8 to represent a good model, while others might
prefer 0.7. An alternative/complementary approach is to benchmark the performance evolved mod-
els against similar data-driven models, often created as part of the same study but developed using
different techniques. As a result, it is common to see more than one data-driven model being used
in a study and the best or winning model selected using goodness-of-fit statistics only. The underly-
ing principle in either case is to provide a summary statistic of the difference occurring between a
predefined set of observed measurement records and their corresponding model predicted outputs.
Scatterplots provide a good visual snapshot of the performance of a model where observed val-
ues are plotted against modelled ones; better models exhibit a straighter linear relationship of points
and a greater conformance to a one-to-one line inserted on the plot. Another useful tool, which is
more commonly used in time-series modelling, is residual analysis. This is where the difference
between observed and modelled values is plotted and reviewed for obvious patterns, the ideal solu-
tion providing no discernible pattern in the scatter cloud.
One-at-a-time response function sensitivity analysis is a tool that can be used to determine
whether an evolved solution is a rational representation of the system being studied (Beriro et al.,
2013). One of the advantages of this approach is expediency, as has been shown on a number of
occasions (e.g. Alkroosh and Nikraz, 2011; Kayadelen et al., 2009; Luan et al., 2008; Taskiran,
2010). The analysis is usually completed by creating a series of artificial datasets, one for each
independent variable expressed in the model. Each data set differs from the next in terms of the
variable under test, which is a range calculated using increments between its maximum and mini-
mum values; all other variables remain at their mean. The model is then applied to each data set
and the calculated outputs plotted as response function curves which are then compared to cross-
correlations derived from the original data. If the plots for each variable reveal a similar strength
Search WWH ::




Custom Search