Information Technology Reference
In-Depth Information
attribute “play”), which will be represented respectively by “a” and “b”, thus
giving T = {a, b}.
And now the rules for encoding a decision tree in a linear genome are very
similar to the rules used to encode mathematical expressions with functions
and variables in the nodes of the trees (this is explained in chapter 2, The
Entities of Gene Expression Programming). So, for decision tree induction,
the genes will also have a head and a tail, with the head containing attributes
and terminals and the tail containing only terminals. The reason for this is
again to ensure that all the decision trees created by GEP are always valid
programs. Furthermore, the size of the tail t is also dictated by the head size
h and the number of branches of the attribute with more branches n max and is
evaluated by the equation:
t = h ( n max - 1) + 1 (9.1)
So, for the play tennis data of Table 9.1, both OUTLOOK and TEMPERA-
TURE branch into three, whereas HUMIDITY and WINDY branch into two,
thus giving maximum arity n max = 3. For instance, if a head size of five is
chosen, then the tail size will be t = 5 · (3 - 1) + 1 = 11 and the gene size g
will be g = h + t = 5 + 11 = 16. One such gene is shown below:
0123456789012345
HOTbWaaaaaabbbbb (9.2)
The expression of this kind of gene is done in exactly the same manner as in
all the GEP systems, giving the following decision tree:
H
O
T
a
a
a
a
b
W
a
a
This simplified decision tree can also be represented as a conventional deci-
sion tree, that is, with all the branches clearly labeled (Figure 9.2). And as
you can easily check, it solves correctly 12 of the 14 instances of Table 9.1.
The process of decision tree induction with GEP starts, as usual, with an
initial population of randomly created chromosomes. Then these
Search WWH ::




Custom Search