Java Reference
In-Depth Information
Table 7-4
Tree node details table
Node
Rule
Prediction
#Cases
Confidence
Support
1
Attriter
10
5/10
0.5
5/10
0.5
2
Age > 36
Attriter
3
3/3
1.0
3/10
0.3
3
Age <
36
Non-attriter
7
5/7
0.7
7/10
0.7
4
Age <
36 and
Sav. Bal < 21,500
Non-attriter
5
5/5
1.0
5/10
0.5
5
Age <
36 and
Sav. Bal > 21,500
Attriter
2
2/2
1.0
2/10
0.2
Table 7-4 lists tree node details, such as node ID, rule, prediction,
the number of cases that belong to the node, and the confidence and
support of the rule. For example, node-2 has three cases (1, 4, and 6)
that satisfy the predicate age > 36 and all of them are attriters, hence
this node's confidence value is 3/3
1, or 100 percent. However,
only 3 out of 10 cases support the rule defined by node-2, hence the
support value is 3/10
0.3. As node-2 has a confidence value of 1,
it is called a pure node and no further splits can be made. Node-3 can
be split further because its confidence value is less than 1, that is,
5/7
0.71, and confidence can be improved by using the average
savings balance attribute as shown in Table 7-4. In this tree, nodes 2, 4,
and 5 are called leaf nodes , because they do not have any child nodes.
Algorithm Settings
Algorithm settings allow users to exert finer control over the algorithm
to attain better results during the build process. Decision tree models
can be extremely accurate on the build data if allowed to overfit the
build data. This occurs by allowing the algorithm to build deeper
trees with rules specific to even individual cases. Hence, overfit mod-
els give very good accuracy with the build data, but do not generalize
well on new data, resulting in decreased predictive accuracy.
To avoid overfitting, users can apply stopping criteria and pruning
techniques. Algorithms typically iterate over the build data, learning
the patterns that exist in the data or making finer distinctions. Some
algorithms could continue this iteration practically indefinitely. As
such, algorithms often provide stopping criteria , which tell the algorithm
when to stop building the model. In the case of a decision tree algo-
rithm, stopping criteria are used to avoid model overfitting and con-
trol tree size. Decision tree stopping criteria include maximum depth of
the tree to avoid deep trees with too many predicates, minimum leaf
Search WWH ::

Custom Search