Geoscience Reference
In-Depth Information
Banzhaf, 2001, 2007) and gene expression programming (GEP) (Ferreira, 2001). The common
theme in each technique is that they derive transparent solutions are derived using principles bor-
rowed from biological evolutionary science and evolve models that can be easily evaluated. From
a modelling viewpoint, GP outputs should not be classed as black-box models, that is, the inner
workings of the model are not transparent, and physical interpretation of its behaviour is therefore
difficult. Unfortunately, in many cases, GP modelling is incorrectly treated as a black-box tool.
This is perhaps because the modelling process itself is highly automated and contains a large
number of operational parameter settings, is perceived to be shrouded in mystery - an unfortunate
misconception that this chapter intends to address.
EPIAs have become established tools and provide widely accepted norms for resolving complex
non-linear problems, including different types of modelling required by geographers. Keyword
searches performed using the bibliographic database, Scopus (www.scopus.com), revealed over
95,000 hits for 'genetic algorithm*', over 6,000 hits for 'genetic programming' and over 500 hits
for 'gene expression programming'. In each case, results showed mostly incremental growth since
2000 (Scopus Sciverse, 2013). These records indicate that a model breeding approach to resolv-
ing scientific problems is popular in many sectors of the international research community. In the
late 1990s, spatial data rapidly became much more widely available which stimulated GP-based
spatial analysis (Diplock and Openshaw, 1996). Most of this body of work understandably rests
within the disciplines of computer science, engineering and mathematics - around 5% relates to
the geographical domain. However, more recently, the world has witnessed the proliferation of
global climatological and hydrological data - so-called Spatial Big Data (Sui, 2014) - in response
to an international requirement to better understand global environmental issues, as well as a
desire to find fresh ways to help resolve old problems, such as pan evaporation (e.g. Guven and
Kisi, 2010). GP applications in geography have recently expanded from their initial role in per-
forming spatial analysis operations (e.g. Diplock, 1998; Litschert, 2004; Parasuraman et al., 2007;
Sheeren et al., 2006) into modelling environmental processes: reported fields of enquiry include
rainfall forecasting (e.g. Hashmi et al., 2011; Kashid and Maity, 2012; Mondal and Mujumdar,
2012), river flow forecasting (e.g. Azamathulla and Zahiri, 2012; Maheswaran and Khosa, 2011;
Mondal and Mujumdar, 2012; Sivapragasam et al., 2011). sediment transport (e.g. Ab Ghani and
Azamathulla, 2011, 2012), soil mechanics (Mollahasani et al., 2011; Padarian et al., 2012) and eco-
logical applications (Jeong et al., 2011; Wang et al., 2012), For a more general and extensive source
of information, the reader is encouraged to consult the genetic programming bibliography hosted
by the University of Birmingham, United Kingdom (http://www.cs.bham.ac.uk/~wbl/biblio/).
Not all GP studies approach the problem they are designed to examine in the same way. It is
important to distinguish between different types of study. One way is to look at the type of data
being used in model development and testing. For example, there are two principal data types that
may be selected as input/output variables: one is observed data and the other is calculated data
(i.e. values derived from a specified equation). Each type will produce a different kind of model:
(1) a simulator which resolves natural processes using observed data or (2) an emulator which is a
model of the predictions or estimations made by an existing equation or set of equations and is usu-
ally intended to reduce model complexity (Abrahart et al., 2012; Beriro et al., 2013).
Another important difference between GP studies is the way in which authors tackle model
evaluation. A large number of GP explorations compare their model outputs with counterpart data-
driven modelling outputs, using the results to select a preferred solution. This approach tends to
focus on comparing the goodness-of-fit statistics of derived solutions instead of exploring the poten-
tial for meaningful knowledge discovery on offer from the GP solution using techniques such as
sensitivity analysis.
GP can also be used to perform conjunction modelling (e.g. serial operation, in which output
from Model A is fed into Model B) or to develop multimodel combinations (e.g. parallel operation,
* Wildcard
Search WWH ::




Custom Search