Database Reference
In-Depth Information
5) If your decision tree is large or hard to read, how can you adjust its visual layout to
improve readability?
EXERCISE
For this chapter's exercise, you will make a decision tree to predict whether or not you, and others
you know would have lived, died, or been lost if you had been on the Titanic. Complete the
following steps.
1) Conduct an Internet search for passenger lists for the Titanic. The search term 'Titanic
passenger list' in your favorite search engine will yield a number of web sites containing
lists of passengers.
2) Select from the sources you find a sample of passengers. You do not need to construct a
training data set of every passenger on the Titanic (unless you want to), but get at least 30,
and preferably more. The more robust your training data set is, the more interesting your
results will be.
3) In a spreadsheet in OpenOffice Calc, enter these passengers' data.
a. Record attributes such as their name, age, gender, class of service they traveled in,
race or nationality if known, or other attributes that may be available to you
depending on the detail level of the data source you find.
b. Be sure to have at least four attributes, preferably more. Remember that the
passengers' names or ID numbers won't be predictive, so that attribute shouldn't
be counted as one of your predictor attributes.
c. Add to your data set whether the person lived (i.e. was rescued from a life boat or
from the water), died (i.e. their body was recovered), or was lost (i.e. was on the
Titanic's manifest but was never accounted for and therefore presumed dead after
the ship's sinking). Call this attribute 'Survival_Result'.
d. Save this spreadsheet as a CSV file and then import it into your RapidMiner
repository. Set the Survival_Result attribute's role to be your label. Set other
 
Search WWH ::




Custom Search