Geoscience Reference
In-Depth Information
in which output from Model A is combined with output from Model B) and more recently for the
explicit purposes of developing multimodel combinations (e.g. Fernando et al., 2012; Kisi and
Shiri, 2011; Mostafavi et al., 2013; Nourani et al., 2012). In each case, the univariate output derived
from the GP equation can be used in subsequent data-driven modelling operations. Conversely, con-
junction modelling may also encompass some form of preliminary data transformation tool, such as
wavelet analysis or principal component analysis, which is first applied to the raw data, prior to any
exploring or modelling using GP.
8.2 SYMBOLIC REGRESSION
GP is most commonly used in a geographical setting to derive predictive models for a particular
problem by means of symbolic regression, with applications ranging from modelling the intricacies
of a climatic process (e.g. Stanislawska et al., 2012) to analysing remote sensing data (Momm and
Easson, 2011). This chapter, consequently, focuses on symbolic regression. It should nevertheless
be stressed that GP can also be used to model other things such as formal logic or source code.
For completeness, other geographical/environmental GP applications include classification (e.g.
Padarian et al., 2012), decision trees (e.g. Wang et al., 2006) and logistic regression (e.g. Seckin and
Guven, 2012).
At this point, you may be asking: (1) What is symbolic regression? and (2) Why not use existing
linear, quadratic or polynomial regression techniques for my analysis? The answer to the second
question is simple - if it works, then you probably should, especially if the problem is of a known
form, such as, linear, near linear or another standard function. The answer to the first requires a
little more explanation. Conventional regression is deterministic, resulting in the same model or
answer every time you apply the analysis. Such as, the form of linear regression functions is fixed,
as are the variables used to examine the problem; the only parts determined by the regression of Y
on X 1 and X 2 are the coefficients a , b and c as depicted in the following equation:
Y = a + bX 1 + cX 2
(8.1)
In contrast, by way of a simple definition, Koza (1990, p. 2) suggests that symbolic regression:
…requires finding a function, in symbolic form, that fits a given sampling of values of the dependent
variable associated with particular given values of the independent variable(s).
In other words, conventional regression is about function fitting, whereas symbolic regression is
about function discovery and fitting. Symbolic regression, as with all GP, is stochastic, in which the
final answer will in most cases be slightly different at the end of each model run. All variables are
treated as quasi-random elements, in which their use is dependent on the software settings selected
by the user and on any limitations imposed by the technique - that is, there is no need for any a
priori assumptions to be made about the data or form of the solution.
The success of symbolic regression is at least in part due to its data-driven nature, where knowl-
edge of the natural system or functional form of the solution is not an imperative, and solutions
are quasi-randomly evolved using a data set that describes the problem which is to be investigated.
In this way, the technique is able to overcome some of the assumptions that bind statistical or
mechanistic modelling, as well as some of the limitations associated with polynomial or para-
metric regression and/or black-box modelling (Schmidt et al., 2011). That said, the idea that one
should remain ignorant as to the provenance of the data or problem is somewhat absurd. A simple
symbolic regression example follows which is based on a small artificial data set from which a
perfect solution could be derived (Table 8.1). The problem was examined using GeneXproTools 4.0
in which different combinations of input symbols ( D 0- D 5) and mathematical functions (+, −, /, *)
were manipulated by the software such that the dependent variable could be predicted from our
independent inputs with a high degree of accuracy.
Search WWH ::




Custom Search