The decision tree algorithm is one of the most popular algorithms
because it is easy to understand how it makes predictions. A decision
tree produces rules that not only explain how or why a prediction was
made, but are also useful in segmenting a population, that is, showing
which groupings of cases produce a certain outcome. Decision tree is
widely used for classification, and some implementations also support
regression. In this section, we give an overview of the decision tree
algorithm and discuss concepts behind its settings as defined in JDM.
Decision tree models are a lot like playing the game 20 Questions
[20QNET], where a player asks a series of questions of a person con-
cealing the name of an object. These questions allow the player to
keep narrowing the space of possible objects. When the space is suffi-
ciently constrained, a guess can be made about the name of the
object. In playing 20 Questions, we rely on a vast range of experience
acquired over many years to know which questions to ask and what
the likely outcome is. With decision trees, an algorithm looks over a
constrained set of experience, that is, the dataset. It then determines
which questions can be asked to produce the right answer, that is,
classify each case correctly.
In this example, let us assume the input dataset has only three
active attributes from the CUSTOMERS dataset introduced in
Section 7.1.3: age, capital gains, and average savings account balance
and 10 customer cases. Each case has a known target value as
shown in Table 7-3. Note that 5 out of 10 customers attrite, hence
there is a 50 percent chance that a randomly selected customer will
attrite. Using the attribute details in this dataset, a decision tree algo-
rithm can learn data patterns and build a tree as shown in Figure 7-3.
In a decision tree, each node-split is based on an attribute condi-
tion that partitions or splits the data. In this example, the tree root
node, node-1, shown in Figure 7-3, represents all 10 customers in the
dataset. From these 10 customer cases the algorithm learns that cus-
tomers whose age is greater than 36 are likely to attrite. So node-1
splits data into node-2 and node-3 based on the customer's age .
Node-3 further splits its data into node-4 and node-5 based on the
customer's savings account balance .
Each tree node has an associated rule that predicts the target value
with a certain confidence and support . The confidence value is a measure
of likelihood that the tree node will correctly predict the target value.