Database Reference
In-Depth Information
The result of information gain for all the input variables is shown in Table 7.2 .
Attribute poutcome has the most information gain and is the most informative
variable. Therefore, poutcome is chosen for the first split of the decision tree,
as shown in Figure 7.4 . The values of information gain in Table 7.2 are small
in magnitude, but the relative difference matters. The algorithm splits on the
attribute with the largest information gain at each round.
Table 7.2 Calculating Information Gain of Input Variables for the First Split
Attribute Information Gain
poutcome 0.0289
0.0201
contact
0.0133
housing
0.0101
job
education 0.0034
0.0018
marital
0.0010
loan
0.0005
default
Detecting Significant Splits
Quite often it is necessary to measure the significance of a split in a decision tree,
especially when the information gain is small, like in Table 7.2 .
Let and be the number of class A and class B in the parent node. Let
represent the number of class A going to the left child node,
represent the
number of class B going to the left child node,
represent the number of class B
going to the right child node, and
represent the number of class B going to the
right child node.
Let
and
denote the proportion of data going to the left and right node,
respectively.
Search WWH ::




Custom Search