Database Reference
In-Depth Information
CHAPTER THREE:
DATA PREPARATION
CONTEXT AND PERSPECTIVE
Jerry is the marketing manager for a small Internet design and advertising firm. Jerry's boss asks
him to develop a data set containing information about Internet users. The company will use this
data to determine what kinds of people are using the Internet and how the firm may be able to
market their services to this group of users.
To accomplish his assignment, Jerry creates an online survey and places links to the survey on
several popular Web sites. Within two weeks, Jerry has collected enough data to begin analysis, but
he finds that his data needs to be denormalized. He also notes that some observations in the set
are missing values or they appear to contain invalid values. Jerry realizes that some additional work
on the data needs to take place before analysis begins.
LEARNING OBJECTIVES
After completing the reading and exercises in this chapter, you should be able to:
Explain the concept and purpose of data scrubbing
List possible solutions for handling missing data
Explain the role and perform basic methods for data reduction
Define and handle inconsistent data
Discuss the important and process of attribute reduction
APPLYING THE CRISP DATA MINING MODEL
Recall from Chapter 1 that the CRISP Data Mining methodology requires three phases before any
actual data mining models are constructed. In the Context and Perspective paragraphs above, Jerry
25
 
 
 
Search WWH ::




Custom Search