Database Reference
In-Depth Information
questions as an opportunity to validate and strengthen your model. Remember that 'pride goeth
before the fall'! Data mining is a process. If you present your data mining results and
recommendations as infallible, you are not participating in the cyclical nature of CRISP-DM, and
you'll likely end up looking foolish sooner or later. CRISP-DM is such a good process precisely
because of its ability to help us investigate data, learn from our investigation, and then do it again
from a more informed position. Evaluation and Deployment are the two steps in the process
where we establish that more informed position.
REVIEW QUESTIONS
1) What is cross-validation and why should you do it?
2) What is a false positive and why might one be generated?
3) Why would false positives not negate all value for a data mining model?
4) How does a model's overall performance percentage relate to the target attribute's (label's)
individual performance percentages?
5) How can changing a data mining methodology's underlying algorithm affect a model's
cross-validation performance percentages?
EXERCISE
For this chapter's exercise, you will create a cross-validation model for your Chapter 10 exercise
training data set. Complete the following steps.
1) Open RapidMiner to a new, blank process and add the training data set you created for
your Chapter 10 exercise (the Titanic survival data set).
2) Set roles as necessary.
3) Apply a cross-validation operator to the data set.
4) Configure your sub-process using gain_ratio for the Decision Tree operator's algorithm.
Apply the model and run it through a Performance (Classification) operator.
 
 
Search WWH ::




Custom Search