Database Reference
In-Depth Information
3) Using the green right and left arrows, you can select which attributes you would like to
keep. Suppose we were going to study the demographics of Internet users. In this
instance, we might select Birth_Year, Gender, Marital_Status, Race, and perhaps
Years_on_Internet, and move them to the right under Selected Attributes using the right
green arrow. You can select more than one attribute at a time by holding down your
control or shift keys (on a Windows computer) while clicking on the attributes you want to
select or deselect. We could then click OK, and these would be the only attributes we
would see in results perspective when we run our model. All subsequent downstream data
mining operations added to our model will act only upon this subset of our attributes.
CHAPTER SUMMARY
This chapter has introduced you to a number of concepts related to data preparation. Recall that
Data Preparation is the third step in the CRISP-DM process. Once you have established
Organizational Understanding as it relates to your data mining plans, and developed Data
Understanding in terms of what data you need, what data you have, where it is located, and so
forth; you can begin to prepare your data for mining. This has been the focus of this chapter.
The chapter used a small and very simple data set to help you learn to set up the RapidMiner data
mining environment. You have learned about viewing data sets in OpenOffice Base, and learned
some ways that data sets in relational databases can be collated. You have also learned about
comma separated values (CSV) files.
We have then stepped through adding CSV files to a RapidMiner data repository in order to
handle missing data, reduce data through observation filtering, handle inconsistencies in data, and
reduce the number of attributes in a model. All of these methods will be used in future chapters to
prepare data for modeling.
Data mining is most successful when conducted upon a foundation of well-prepared data. Recall
the quotation from Chapter 1from Alice's Adventures in Wonderland —which way you go does not
matter very much if you don't know, or don't care, where you are going. Likewise, the value of
where you arrive when you complete a data mining exercise will largely depend upon how well you
prepared to get there. Sometimes we hear the phrase “It's better than nothing”. Well, in data
mining, results gleaned from poorly prepared data might be “Worse than nothing”, because they
 
Search WWH ::




Custom Search