Information Technology Reference
In-Depth Information
9
Decision Tree Induction
Decision tree induction is extremely popular in data mining, with most cur-
rently available techniques being refinements of Quinlan's original work
(Quinlan 1986). His divide-and-conquer approach to decision tree induction
involves selecting an attribute to place at the root node and then make the
same decision about every other node in the tree.
Gene expression programming can also be used to design decision trees,
with the advantage that all the decisions concerning the growth of the tree
are made by the algorithm itself without any kind of human input, that is, the
growth of the tree is totally determined and refined by evolution.
There are basically two different types of decision trees. The first one is
the simplest and is used to induce decision trees with nominal attributes. But
inducing decision trees both with nominal and numeric attributes (mixed
attributes) is considerably more complicated and more sophisticated meth-
ods are required to grow the trees. This aspect of decision tree induction
carries also to gene expression programming, and I developed two different
algorithms to deal with both types of decision trees. The first one - evolv-
able decision trees or EDT for short - induces decision trees with nominal
attributes; and the second one - evolvable decision trees with random nu-
merical constants or EDT-RNC for short - was developed for handling nu-
meric attributes but, in fact, can handle all kinds of attributes: from decision
trees with just numeric attributes and decision trees with just nominal at-
tributes to decision trees with both nominal and numeric attributes.
How both these algorithms are implemented and how they work will be
explained in this chapter. We will also analyze their performance by solving
four real-world classification problems: the already familiar breast cancer
and iris problems, both of them with numeric attributes, and two new chal-
lenging problems: the lymphography problem with mixed attributes and the
postoperative patient problem with nominal attributes.
Search WWH ::




Custom Search