Database Reference
In-Depth Information
CHAPTER SUMMARY
Decision trees are excellent predictive models when the target attribute is categorical in nature, and
when the data set is of mixed types. Although this chapter's data sets did not contain any
examples, decision trees are better than more statistics-based approaches at handling attributes that
have missing or inconsistent values that are not handled—decision trees will work around such
data and still generate usable results.
Decision trees are made of nodes and leaves (connected by labeled branch arrows), representing
the best predictor attributes in a data set. These nodes and leaves lead to confidence percentages
based on the actual attributes in the training data set, and can then be applied to similarly
structured scoring data in order to generate predictions for the scoring observations. Decision
trees tell us what is predicted, how confident we can be in the prediction, and how we arrived at the
prediction. The 'how we arrived at' portion of a decision tree's output is shown in a graphical view
of the tree.
REVIEW QUESTIONS
1) What characteristics of a data set's attributes might prompt you to choose a decision tree
data mining methodology, rather than a logistic or linear regression approach? Why?
2) Run this chapter's model using the gain_ratio algorithm and make a note of three or four
individuals' prediction and confidences. Then re-run the model under gini_index. Locate
the people you noted. Did their prediction and/or confidences change? Look at their
attribute values and compare them to the nodes and leaves in the decision tree. Explain
why you think at least one person's prediction changed under Gini, based on that person's
attributes and the tree's nodes.
3) What are confidence percentages used for, and why would they be important to consider,
in addition to just considering the prediction attribute?
4) How do you keep an attribute, such as a person's name or ID number, that should not be
considered predictive in a process's model, but is useful to have in the data mining results?
 
 
Search WWH ::




Custom Search