Database Reference
In-Depth Information
attributes which are not predictive, such as names, to not be considered in the
decision tree model.
e. Add a Decision Tree operator to your stream.
4) In a new, blank spreadsheet in OpenOffice Calc, duplicate the attribute names from your
training data set, with the exception of Survival_Result. You will predict this attribute
using your decision tree.
5) Enter data for yourself and people that you know into this spreadsheet.
a. For some attributes, you may have to decide what to put. For example, the author
acknowledges that based on how relentlessly he searches for the absolutely
cheapest ticket when shopping for airfare, he almost certainly would have been in
3 rd class if he had been on the Titanic. He further knows some people who very
likely would have been in 1 st class.
b. If you want to include some people in your data set but you don't know every single
attribute for them, remember, decision trees can handle some missing values.
c. Save this spreadsheet as a CSV file and import it into your RapidMiner repository.
d. Drag this data set into your process and ensure that attributes that are not predictive,
such as names, will not be included as predictors in the model.
6) Apply your decision tree model to your scoring data set.
7) Run your model using gain_ratio. Report your tree nodes, and discuss whether you and
the people you know would have lived, died or been lost.
8) Re-run your model using gini_index. Report differences in your tree's structure. Discuss
whether your chances for survival increase under Gini.
9) Experiment with changing leaf and split sizes, and other decision tree algorithm criteria,
such as information_gain. Analyze and report your results.
Search WWH ::




Custom Search