Decision Trees - Data Mining for the Masses

Database Reference

In-Depth Information

attributes which are not predictive, such as names, to not be considered in the

decision tree model.

e. Add a Decision Tree operator to your stream.

4) In a new, blank spreadsheet in OpenOffice Calc, duplicate the attribute names from your

training data set, with the exception of Survival_Result. You will predict this attribute

using your decision tree.

5) Enter data for yourself and people that you know into this spreadsheet.

a. For some attributes, you may have to decide what to put. For example, the author

acknowledges that based on how relentlessly he searches for the absolutely

cheapest ticket when shopping for airfare, he almost certainly would have been in

3 rd class if he had been on the Titanic. He further knows some people who very

likely would have been in 1 st class.

b. If you want to include some people in your data set but you don't know every single

attribute for them, remember, decision trees can handle some missing values.

c. Save this spreadsheet as a CSV file and import it into your RapidMiner repository.

d. Drag this data set into your process and ensure that attributes that are not predictive,

such as names, will not be included as predictors in the model.

6) Apply your decision tree model to your scoring data set.

7) Run your model using gain_ratio. Report your tree nodes, and discuss whether you and

the people you know would have lived, died or been lost.

8) Re-run your model using gini_index. Report differences in your tree's structure. Discuss

whether your chances for survival increase under Gini.

9) Experiment with changing leaf and split sizes, and other decision tree algorithm criteria,

such as information_gain. Analyze and report your results.

Search WWH ::

Custom Search

Home