Information Technology Reference
In-Depth Information
only reward clear hits and, for instance, will not differentiate outputs such as
111 or 001 for class B (output 010). And the third one, let's call it ambiguity/
sharpness , will reward clear hits and partial hits differently and will also
distinguish between different degrees of ambiguity; for instance, outputs such
as 000 or 111 for class C (output 001) will score differently, with the former
being less ambiguous with just one mishit.
The ambiguity/sharpness fitness function performs slightly better than the
other two (and the first is slightly worse than the second) and, therefore, we
are going to use this fitness function in our three-class problem. So, let's
start by explaining what is understood by sharpness and ambiguity.
The sharpness S is very easy to compute and corresponds to the number of
partial hits; for instance, for a sample belonging to class A (output 100), the
output 101 will have S = 2 and the output 001 will have S = 1. The ambiguity
A is a measure of sharpness and the higher the sharpness the smaller the
ambiguity. And by definition, maximum sharpness (which obviously corre-
sponds to the number of classes n ) corresponds to a degree of ambiguity one,
that is, when S = n, then A = 1; S = n -1 corresponds to A = 2; S = n -2
corresponds to A = 3; and cases where S = 0 contribute nothing to fitness.
More formally, the fitness f (ij) of an individual program i for sample case j is
evaluated by the equation:
1
A
f ij
(4.15)
3
Thus, when A = 1 (that is, when we have a clear hit and sharpness has the
maximum value) f (ij) = 1. Consequently, for the ambiguity/sharpness fitness
function, maximum fitness is equal to the number of sample cases.
For this problem, we will use the same function set used in the previous
section, that is, F = {+, -, *, /} and also the same set of terminals, which
obviously includes all the four attributes of the iris data - sepal length, sepal
width, petal length, and petal width. For this three-class problem, we will
obviously use chromosomes composed of three genes, each encoding a dif-
ferent sub-model. The same 0/1 rounding threshold of 0.5 was chosen for all
the sub-models to convert their outputs into 0 or 1. The head size for all the
three genes is equal to 10, corresponding to a maximum sub-program length
of 21 nodes. The fitness will be evaluated by equation (4.15) and, therefore,
for this problem with 150 different plants, f max = 150.
Since this is the first time that the GEP-MO system is being put to the test,
we are going to evaluate its performance with an experiment with 100 runs,
and since the algorithm is unable to classify correctly all the sample cases of
Search WWH ::




Custom Search