Database Reference
In-Depth Information
CHAPTER THIRTEEN:
EVALUATION AND DEPLOYMENT
HOW FAR WE'VE COME
The purpose of this topic, which was explained in Chapter 1, is to introduce non-experts and non-
computer scientists to some of the methods and tools of data mining. Certainly there have been a
number of processes, tools, operators, data manipulation techniques, etc., demonstrated in this
book, but perhaps the most important lesson to take away from this broad treatment of data
mining is that the field has become huge, complex, and dynamic. You have learned about the
CRISP-DM process, and had it shown to you numerous times as you have seen data mining
models that classified, predicted and did both. You have seen a number of data processing tools
and techniques, and as you have done this, you have hopefully noticed thy myriad other operators
in RapidMiner that we did not use or discuss. Although you may be feeling like you're getting
good at data mining (and we hope you do), please recognize that there is a world of data mining
that this topic has not touched on—so there is still much for you to learn.
This chapter and the next will discuss some cautions that should be taken before putting any real-
world data mining results into practice. This chapter will demonstrate a method for using
RapidMiner to conduct some validation for data mining models; while Chapter 14 will discuss the
choices you will make as a data miner, and some ways to guide those choices in good directions.
Remember from Chapter 1 that CRISP-DM is cyclical—you should always be learning from the
work you are doing, and feeding what you've learned from your work back into your next data
mining activity.
For example, suppose you used a Replace Missing Values operator in a data mining model to set all
missing values in a data set to the average for each attribute. Suppose further that you used results
of that data mining model in making decisions for your company, and that those decisions turned
out to be less than ideal. What if you traced those decisions back to your data mining activities and
found that by using the average, you made some general assumptions that weren't really very
219
 
 
Search WWH ::




Custom Search