Evaluation and Deployment - Data Mining for the Masses

Database Reference

In-Depth Information

11) Re-run the model. You do not need to switch back to the main process to re-run the

model. You may if you wish, but you can stay in sub-process view and run it too. When

you return from results perspective to design perspective, you will see whichever design

window you were last in. When you re-run the model, you will see a new performance

matrix, showing the model's predictive power using Gini as the underlying algorithm.

Figure 13-10. New cross-validation performance results based on

the gini_index decision tree model.

We see in Figure 13-10 that our model's ability to predict is significantly improved if we

use Gini for our decision tree model. This should also not come as a great surprise. We

knew from Chapter 10 that the granularity in our tree's detail under Gini was much greater.

Greater detail in the predictive tree should result in a more reliably predictive model.

Feeding more and better training data into the training data set would likely raise this

model's reliability even more.

CHAPTER SUMMARY: THE VALUE OF EXPERIENCE

So now we have seen one way to statistically evaluate a model's reliability. You have seen that

there are a number of cross-validation and performance operators that you can use to check a

training data set's ability to perform. But the bottom line is that there is no substitute for

experience and expertise. Use subject matter experts to review your data mining results. Ask them

to give you feedback on your model's output. Run pilot tests and use focus groups to try out your

model's predictions before rolling them out organization-wide. Do not be offended if someone

questions or challenges the reliability of your model's results—be humble enough to take their

Search WWH ::

Custom Search

Home