Database Reference
In-Depth Information
Chapter 7
Popular Decision Trees Induction
Algorithms
7.1 Overview
In this chapter, we shortly review some of the popular decision trees
induction algorithms, including: ID3, C4.5, and CART. All of these
algorithms are using splitting criterion and pruning methods that were
already described in previous chapters. Therefore, the aim of this chapter
is merely to indicate which setting each algorithm is using and what are
the advantages and disadvantages of each algorithm.
7.2
ID3
The ID3 algorithm is considered to be a very simple decision tree algorithm
[ Quinlan (1986) ] . Using information gain as a splitting criterion, the ID3
algorithm ceases to grow when all instances belong to a single value of a
target feature or when best information gain is not greater than zero. ID3
does not apply any pruning procedure nor does it handle numeric attributes
or missing values.
The main advantage of ID3 is its simplicity. Due to this reason, ID3
algorithm is frequently used for teaching purposes. However, ID3 has several
disadvantages:
(1) ID3 does not guarantee an optimal solution, it can get stuck in local
optimums because it uses a greedy strategy. To avoid local optimum,
backtracking can be used during the search.
(2) ID3 can overfit to the training data. To avoid overfitting, smaller
decision trees should be preferred over larger ones. This algorithm
77
Search WWH ::




Custom Search