Decision Tree Induction - Gene Expression Programming

Information Technology Reference

In-Depth Information

attribute “play”), which will be represented respectively by “a” and “b”, thus

giving T = {a, b}.

And now the rules for encoding a decision tree in a linear genome are very

similar to the rules used to encode mathematical expressions with functions

and variables in the nodes of the trees (this is explained in chapter 2, The

Entities of Gene Expression Programming). So, for decision tree induction,

the genes will also have a head and a tail, with the head containing attributes

and terminals and the tail containing only terminals. The reason for this is

again to ensure that all the decision trees created by GEP are always valid

programs. Furthermore, the size of the tail t is also dictated by the head size

h and the number of branches of the attribute with more branches n max and is

evaluated by the equation:

t = h ( n max - 1) + 1 (9.1)

So, for the play tennis data of Table 9.1, both OUTLOOK and TEMPERA-

TURE branch into three, whereas HUMIDITY and WINDY branch into two,

thus giving maximum arity n max = 3. For instance, if a head size of five is

chosen, then the tail size will be t = 5 · (3 - 1) + 1 = 11 and the gene size g

will be g = h + t = 5 + 11 = 16. One such gene is shown below:

0123456789012345

HOTbWaaaaaabbbbb (9.2)

The expression of this kind of gene is done in exactly the same manner as in

all the GEP systems, giving the following decision tree:

H

O

T

a

b

W

a

This simplified decision tree can also be represented as a conventional deci-

sion tree, that is, with all the branches clearly labeled (Figure 9.2). And as

you can easily check, it solves correctly 12 of the 14 instances of Table 9.1.

The process of decision tree induction with GEP starts, as usual, with an

initial population of randomly created chromosomes. Then these

Search WWH ::

Custom Search

Home