Genetic Programming - GeoComputation

Geoscience Reference

In-Depth Information

experimentation and involves a considerable amount of trial and error testing. The operational

controls of GP are akin to biological evolution, that is, creation of populations, intergenerational

evolution controlled by genetic operators and survival of the fittest. As a result, the terminology

used in this chapter, and to a much greater extent in more technical texts, is heavily reflective of

that used to describe Darwinian evolution. The evolved equations are usually represented/stored

in the form of one or more hierarchically coded lines of computer code, which function as small

computer programs that can be executed using appropriate software. This mix of biological and

computational analogies gives rise to the term GP, in which genetic and evolutionary procedures

are used to develop problem-solving programs.

The GP paradigm was first elucidated by Koza (1990) in a report he produced while at Stanford

University. The importance of his work is far reaching, not least for the range of example applica-

tions it included. Koza importantly unified the purpose of GP by stating:

…the underlying common problem is discovery of a computer program that produces some desired

output when presented with particular inputs. (Koza, 1990: 1)

Koza subsequently reported a range of example problems that required the discovery of such a

computer program, which, among others, included machine learning, process planning in artificial

intelligence and robotics, symbolic function identification (time-series induction), symbolic regres-

sion, empirical discovery, solving of functional equations, pattern recognition, game playing and

neural network design (Koza, 1990).

GP presently offers one of the best effort/reward data-driven modelling opportunities available

to researchers. It does so because difficult non-linear problems may be modelled without any a

priori assumptions being made about the mathematical form that a solution should take and, in

certain cases, the kind of relationship expected from the data. The role of each user in the evolution

of solutions, however, is not totally akin to that of a disinterested player or passive bystander - since

important decisions must be made about the initial scope and content of potential model drivers

(i.e. some specified set of input variables is selected), functional operators (i.e. some specified set

of mathematical symbols is used) and software settings. GP is thus seen to present something of a

scientific conundrum requiring very careful application: on the one hand, it is analogous to a magic

bullet that seemingly offers rapid, powerful and transparent delivery of non-linear problem-solving

equations. The development and reporting of novel and interesting solutions in an open and trans-

parent way is surely good. Yet, on the other hand as per the title of this chapter, perhaps it is a poi-

soned chalice, since it can simultaneously deliver an exposed solution, one that is open to third party

inspection, testing and potential disapproval. Indeed, luring the unwary, it can sometimes deliver

unexpectedly misleading, irrational or overly complex solutions that might otherwise normally be

rejected but unfortunately possess a high level of fit. This chapter will equip you with the necessary

tools and wisdom to identify potential deficiencies in each resultant program and help you decide

how and whether or not to reject poorer quality models. By resolving this dichotomy, you will be left

with an exciting and, thanks to recent software developments, an extremely easy to use data-driven

modelling tool, which evolves user-friendly solutions that can be transposed into any spreadsheet or

statistical computing package for further evaluation and application.

GP is an evolutionary algorithm (Heppenstall and Harland, 2014): a generic population-based

metaheuristic optimisation mechanism, that is, a general-purpose algorithmic framework, that

can be applied to different optimisation problems, but which needs relatively few modifications to

adapt it to a specific problem. It is also a member of an important sub-group, called evolutionary

program-induction algorithms (EPIA) (Graff and Poli, 2010). EPIAs are designed to evolve small

computer programs, comprising executable code, that can be used to solve problems of a similar

nature to the one used during their development. The membership of this group includes several

other popular algorithms: genetic algorithm (GA) (Holland, 1975; Miller and Thompson, 2000),

grammatical evolution (GE) (Ryan et al., 1998), linear genetic programming (LGP) (Brameier and

GeoComputation

Search WWH ::

Custom Search

Home