Geoscience Reference
In-Depth Information
experimentation and involves a considerable amount of trial and error testing. The operational
controls of GP are akin to biological evolution, that is, creation of populations, intergenerational
evolution controlled by genetic operators and survival of the fittest. As a result, the terminology
used in this chapter, and to a much greater extent in more technical texts, is heavily reflective of
that used to describe Darwinian evolution. The evolved equations are usually represented/stored
in the form of one or more hierarchically coded lines of computer code, which function as small
computer programs that can be executed using appropriate software. This mix of biological and
computational analogies gives rise to the term GP, in which genetic and evolutionary procedures
are used to develop problem-solving programs.
The GP paradigm was first elucidated by Koza (1990) in a report he produced while at Stanford
University. The importance of his work is far reaching, not least for the range of example applica-
tions it included. Koza importantly unified the purpose of GP by stating:
…the underlying common problem is discovery of a computer program that produces some desired
output when presented with particular inputs. (Koza, 1990: 1)
Koza subsequently reported a range of example problems that required the discovery of such a
computer program, which, among others, included machine learning, process planning in artificial
intelligence and robotics, symbolic function identification (time-series induction), symbolic regres-
sion, empirical discovery, solving of functional equations, pattern recognition, game playing and
neural network design (Koza, 1990).
GP presently offers one of the best effort/reward data-driven modelling opportunities available
to researchers. It does so because difficult non-linear problems may be modelled without any a
priori assumptions being made about the mathematical form that a solution should take and, in
certain cases, the kind of relationship expected from the data. The role of each user in the evolution
of solutions, however, is not totally akin to that of a disinterested player or passive bystander - since
important decisions must be made about the initial scope and content of potential model drivers
(i.e. some specified set of input variables is selected), functional operators (i.e. some specified set
of mathematical symbols is used) and software settings. GP is thus seen to present something of a
scientific conundrum requiring very careful application: on the one hand, it is analogous to a magic
bullet that seemingly offers rapid, powerful and transparent delivery of non-linear problem-solving
equations. The development and reporting of novel and interesting solutions in an open and trans-
parent way is surely good. Yet, on the other hand as per the title of this chapter, perhaps it is a poi-
soned chalice, since it can simultaneously deliver an exposed solution, one that is open to third party
inspection, testing and potential disapproval. Indeed, luring the unwary, it can sometimes deliver
unexpectedly misleading, irrational or overly complex solutions that might otherwise normally be
rejected but unfortunately possess a high level of fit. This chapter will equip you with the necessary
tools and wisdom to identify potential deficiencies in each resultant program and help you decide
how and whether or not to reject poorer quality models. By resolving this dichotomy, you will be left
with an exciting and, thanks to recent software developments, an extremely easy to use data-driven
modelling tool, which evolves user-friendly solutions that can be transposed into any spreadsheet or
statistical computing package for further evaluation and application.
GP is an evolutionary algorithm (Heppenstall and Harland, 2014): a generic population-based
metaheuristic optimisation mechanism, that is, a general-purpose algorithmic framework, that
can be applied to different optimisation problems, but which needs relatively few modifications to
adapt it to a specific problem. It is also a member of an important sub-group, called evolutionary
program-induction algorithms (EPIA) (Graff and Poli, 2010). EPIAs are designed to evolve small
computer programs, comprising executable code, that can be used to solve problems of a similar
nature to the one used during their development. The membership of this group includes several
other popular algorithms: genetic algorithm (GA) (Holland, 1975; Miller and Thompson, 2000),
grammatical evolution (GE) (Ryan et al., 1998), linear genetic programming (LGP) (Brameier and
Search WWH ::




Custom Search