Java Data Mining Concepts - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

Table 7-4

Tree node details table

Node

Rule

Prediction

#Cases

Confidence

Support

1

Attriter

10

5/10

0.5

5/10

0.5

2

Age > 36

Attriter

3

3/3

1.0

3/10

0.3

3

Age <

36

Non-attriter

7

5/7

0.7

7/10

0.7

4

Age <

36 and

Sav. Bal < 21,500

Non-attriter

5

5/5

1.0

5/10

0.5

5

Age <

36 and

Sav. Bal > 21,500

Attriter

2

2/2

1.0

2/10

0.2

Table 7-4 lists tree node details, such as node ID, rule, prediction,

the number of cases that belong to the node, and the confidence and

support of the rule. For example, node-2 has three cases (1, 4, and 6)

that satisfy the predicate age > 36 and all of them are attriters, hence

this node's confidence value is 3/3

1, or 100 percent. However,

only 3 out of 10 cases support the rule defined by node-2, hence the

support value is 3/10

0.3. As node-2 has a confidence value of 1,

it is called a pure node and no further splits can be made. Node-3 can

be split further because its confidence value is less than 1, that is,

5/7

0.71, and confidence can be improved by using the average

savings balance attribute as shown in Table 7-4. In this tree, nodes 2, 4,

and 5 are called leaf nodes , because they do not have any child nodes.

Algorithm Settings

Algorithm settings allow users to exert finer control over the algorithm

to attain better results during the build process. Decision tree models

can be extremely accurate on the build data if allowed to overfit the

build data. This occurs by allowing the algorithm to build deeper

trees with rules specific to even individual cases. Hence, overfit mod-

els give very good accuracy with the build data, but do not generalize

well on new data, resulting in decreased predictive accuracy.

To avoid overfitting, users can apply stopping criteria and pruning

techniques. Algorithms typically iterate over the build data, learning

the patterns that exist in the data or making finer distinctions. Some

algorithms could continue this iteration practically indefinitely. As

such, algorithms often provide stopping criteria , which tell the algorithm

when to stop building the model. In the case of a decision tree algo-

rithm, stopping criteria are used to avoid model overfitting and con-

trol tree size. Decision tree stopping criteria include maximum depth of

the tree to avoid deep trees with too many predicates, minimum leaf

Java Data Mining: Strategy, Standard, and Practice

Search WWH ::

Custom Search

Home