Database Reference
In-Depth Information
Attribute
poutcome
has the most information gain and is the most informative
variable. Therefore,
poutcome
is chosen for the first split of the decision tree,
as shown in
Figure 7.4
.
The values of information gain in
Table 7.2
are small
in magnitude, but the relative difference matters. The algorithm splits on the
attribute with the largest information gain at each round.
Table 7.2
Calculating Information Gain of Input Variables for the First Split
Attribute Information Gain
poutcome
0.0289
0.0201
contact
0.0133
housing
0.0101
job
education
0.0034
0.0018
marital
0.0010
loan
0.0005
default
Detecting Significant Splits
Quite often it is necessary to measure the significance of a split in a decision tree,
Let and be the number of class A and class B in the parent node. Let
represent the number of class A going to the left child node,
represent the
number of class B going to the left child node,
represent the number of class B
going to the right child node, and
represent the number of class B going to the
right child node.
Let
and
denote the proportion of data going to the left and right node,
respectively.