Geoscience Reference
In-Depth Information
in terms of being meaningful and representative with regard to the system that they are intended to
model. That said, there are many people who are very comfortable with GP as a black-box model-
ling technique. However, in keeping with the many arguments that are presented throughout this
chapter, the current authors clearly believe that GP has far greater potential than is currently being
exploited. This section provides the reader with a relevant and practical guide for applying GP to
geographical and environmental enquiries. It includes several ideas on the evaluation of GP solu-
tions, in part inspired by the work of others on environmental modelling (e.g. Alexandrov et al.,
2011; Ben net t et al., 2012; Jakeman et al., 2006). A summary flow diagram of the five key stages
applicable to any GP experiment is shown in Figure 8.1 - each stage is then explained in detail.
8.3.1 S tage 1 S tudy S ite S election
Selecting a good study site is the first stage of any GP experiment. The site and problem should
be characterised by an appropriate amount of data, sufficiently supporting the research questions
being examined. There is no easy answer to how much data is enough, only to say that there should
not be less than that considered necessary to allow for the conceptual underpinnings of the system
to be understood. For a process such as the flow or level of groundwater in an aquifer, this may
necessitate catchment rainfall, depth to groundwater, topography, physical aquifer properties and/or
other meaningful local or regional factors identified in previously published reports. It would also
be advantageous if the selected data were publicly available since this would support a re-analysis
of the experiment by any interested parties at a later date.
8.3.2 S tage 2 d ata P reParation
Understanding how your data reflects the conceptual underpinnings of each problem being investi-
gated is an important part of data-driven modelling, even if in order to produce the model, a priori
knowledge about the system or form of the solution is not essential (Beriro et al., 2012b; Mount et
al., 2012). This is generally achieved by asking informative questions pertaining to the quality and
usefulness of the data, such as the following: What is the provenance of the data? How and why
were the data collected? What are its shortcomings, for example, systematic/random errors? What
processes are the data describing? Quantitative descriptions and analyses of the data should also be
made using appropriate statistics such as measures of central tendency, variance, determination of
outliers, cross-correlations and distributions. As well as providing background information on the
problem, a descriptive analysis helps to inform whether pre-processing would be useful. For further
information, Abrahart et al. (2010) present a helpful seven-point checklist that you may also wish to
consult in support of your own data preparation framework.
Many data processing operations such as normalisation and/or transformation are available to
researchers - each of which could be applied to raw data prior to GP modelling. Specific examples
of such operations include linear rescaling (Alavi and Gandomi, 2011; Mollahasani et al., 2011)
and discrete wavelet transformation (Kisi and Shiri, 2011). Truncating data is another option that is
also open to modellers, where the removal or smoothing of undesirable features, such as outliers,
can be completed automatically - in an option that is available in both Eureqa (Cornell University,
2013) and GeneXpro Tools (Version 5.0) (Ferreira, 2013). The potential benefits of such techniques
for GP modelling currently remain unclear. This issue is important, because in principle, GP
has the capability to perform many of these operations automatically, as an integrated part of its
numerical search procedure. Factual reporting by researchers however does not yet contain suffi-
cient detail to enable good comparisons to be made between modelling based on transformed and/
or non-transformed model inputs and/or outputs (Beriro et al., 2012b). To overcome this uncer-
tainty, subsequent users are encouraged to experiment with different options, so as to find out
which type of solution is best for the problem being examined and, where possible, reporting the
results of such explorations.
Search WWH ::




Custom Search