Databases Reference
In-Depth Information
along doing calculations in your head. The classification error of the full dataset
is 0.5 since there are an equal number of buyers (Y) and non-buyers (N).
A split on Gender would create a node of male (M) visitors with 3 buyers and
2 non-buyers. The other node would be female (F) visitors with 2 buyers
and 3 non-buyers. The index of each node is 0.4, resulting in a gain for the
split of 0.1 (0.5 - 0.4).
A split on MaritalStatus would create a node of single (S) visitors with 3
buyers and 1 non-buyer. The other node would be married (M) visitors with
2 buyers and 4 non-buyers. The index of the first is 0.25 and of the second is
0.33, resulting in a gain for the split of 0.2 (0.5 - 0.3).
A split on SpeedingCitations at 2 (the best available) would create a node of
speeders (
2) with 4 buyers and 2 non-buyers. The other node would be
>
slow folks (
2) with 1 buyer and 3 non-buyers. The index of the first is 0.33
and of the second is 0.25, resulting in a gain of 0.2 (0.5 - 0.3).
<
Since the gain of the MaritalStatus and SpeedingCitations splits are equal,
either could be chosen. Figure 4.1 shows the decision tree after splitting on
SpeedingCitations.
Focusing on the left node, an evaluation of gain weakly recommends splitting
on either Gender or again on SpeedingCitations. We say weakly, because using
classification error as the index, the gain is 0.0 for each. When using the more
complex Gini index, both potential splits generate a positive gain. The gain of
SpeedingCitations is greatest. Although not detailed in the example, the same
splitting methodology could also be executed to process the right node at the
second level. The resulting tree is in Figure 4.2 after a split using Speeding-
Citations on the left and MaritalStatus on the right.
In Figure 4.2, the bottom node on the left must be terminated because it only
contains non-buyers - stop rule 1. The second node from the left must also be
terminated because the input attributes are identical - stop rule 2.
Buyer #Obs
Y 5
N 5
SpeedingCitations
<2
>2
Buyer
#Obs
Buyer
#Obs
Y
1
Y
4
N
3
N
2
Figure 4.1 Decision Tree
Search WWH ::




Custom Search