Information Technology Reference
In-Depth Information
linked by addition and a function set composed of the basic arithmetic opera-
tors plus the sqrt( x ), exp( x ), ln( x ) functions to spice things a bit; and a stand-
ard fitness function such as the one based on the mean squared error (see
section 3.2.2, Fitness Functions for Symbolic Regression), is a good starting
point for most problems of symbolic regression.
4.1.2 Function Finding on a Five-dimensional Parameter Space
The goal of this section is to show how gene expression programming can be
used to model complex realities with high accuracy. The test function cho-
sen is the following five parameter function:
sin
(
a
)
cos
(
b
)
y
tan(
d
e
)
(4.6)
exp
c
where a , b , c , d , and e are the independent variables, and exp is the irrational
number 2.71828183.
Consider we were given a sampling of the numerical values from this
function over 100 random points in the interval [0, 1] and we wanted to find
a function fitting those values as accurately as possible. We can use, for
instance, the mean squared error to design the fitness function and evaluate
each candidate solution by equation (3.4a), giving f max = 1000.
The set of 100 fitness cases used in this complex task could very well be
unrepresentative of the problem domain, and the program designed by the
algorithm would be modeling a reality other than the reality of function (4.6).
To solve this dilemma, it is common to use a testing set with a reasonable
amount of sample cases. This dataset is not used during the learning process
and therefore can be used to check the usefulness of the model, or in other
words, its generalizing capabilities. For this problem, and because it does
not delay evolution, we will be generous and use a testing set of 200 compu-
ter generated samples. In real-world problems where samples are costly, it is
common practice to use between 30-35% of samples for testing.
The domain of this problem suggests, besides the basic arithmetical func-
tions, the use of sqrt( x ), exp( x ), sin( x ), cos( x ) and tan( x ) in the function set,
which, for simplicity, will be represented, respectively, by the symbols “Q”,
“E”, “S”, “C”, and “T”. Thus, for this problem, the function set consisted of
F = {+, -, *, /, Q, E, S, C, T}, and the set of terminals consisted obviously of
the independent variables, giving T = {a, b, c, d, e}.
Search WWH ::




Custom Search