Information Technology Reference
In-Depth Information
Table 9.9
Organization of the postoperative patient dataset.
Branches
Attribute
Symbol
Arity
L-CORE
A
high, low, mid
3
L-SURF
B
high, low, mid
3
L-O2
C
excellent, good
2
L-BP
D
high, low, mid
3
SURF-STBL
E
stable, unstable
2
CORE-STBL
F
mod-stable, stable, unstable
3
BP-STBL
G
mod-stable, stable, unstable
3
COMFORT
H
05, 07, 10, 15, ?
5
For this experiment, a sub-set of 60 samples was randomly selected for
training and the remaining 30 were used for testing (both these sets are avail-
able at the gene expression programming website). The fitness function was
again based on the number of hits and was evaluated by equation (3.8). As
shown in Table 9.9, the eight attributes were represented by A = {A, ..., H},
splitting respectively into 3, 3, 2, 3, 2, 3, 3, and 5 branches (note that “H”
divides into five branches due to the presence of missing values, which, as
you can see, are handled as just another branch). The terminal set consisted
of T = {a, b, c}, representing respectively classes “A”, “I”, and “S”. Both the
performance and the parameters used per run are shown in Table 9.10.
And as you can see, the EDT algorithm performs quite well at this task
with an average best-of-run fitness of 47.14. Indeed, several good solutions
were designed in this experiment, and two of them are shown below:
GDAaHaBcBFcaaabaccaacaacab...
...acbcabbbcacaaabcaaaacccca
(9.10)
GaAEcBFBAaacaacacaacacbccc...
...bbbbcbbbcbbcacccbababcabc
(9.11)
As you can see by drawing the trees, the first one encodes a decision tree
with a total of 24 nodes, whereas the second one encodes a DT with 21
nodes. These highly compact models are extremely accurate: the first one
has a training fitness of 50 (83.33% accuracy) and a testing fitness of 23
(76.67% accuracy), whereas the second one has a training fitness of 49
(81.67% accuracy) and a testing fitness of 24 (80.00% accuracy) and are,