Database Reference
In-Depth Information
may be misleading. Decisions based upon them could lead an organization down a detrimental
and costly path. Learn to value the process of data preparation, and you will learn to be a better
data miner.
REVIEW QUESTIONS
1) What are the four main processes of data preparation discussed in this chapter? What do
they accomplish and why are they important?
2) What are some ways to collate data from a relational database?
3) For what kinds of problems might a data set need to be scrubbed?
4) Why is it often better to perform reductions using operators rather than excluding
attributes or observations as data are imported?
5) What is a data repository in RapidMiner and how is one created?
6) How might inconsistent data cause later trouble in data mining activities?
EXERCISE
1) Locate a data set of any number of attributes and observations. You may have access to
data sets through personal data collection or through your employment, although if you
use an employer's data, make sure to do so only by permission! You can also search the
Internet for data set libraries. A simple search on the term 'data sets' in your favorite
search engine will yield a number of web sites that offer libraries of data sets that you can
use for academic and learning purposes. Download a data set that looks interesting to you
and complete the following:
2) Format the data set into a CSV file. It may come in this format, or you may need to open
the data in OpenOffice Calc or some similar software, and then use the File > Save As
feature to save your data as a CSV file.
 
 
Search WWH ::




Custom Search