Database Reference
In-Depth Information
DATA UNDERSTANDING
In order to investigate her question, Sarah has enlisted our help in creating a correlation matrix of
six attributes. Working together, using Sarah's employer's data resources which are primarily
drawn from the company's billing database, we create a data set comprised of the following
attributes:
Insulation : This is a density rating, ranging from one to ten, indicating the thickness of
each home's insulation. A home with a density rating of one is poorly insulated, while a
home with a density of ten has excellent insulation.
Temperature : This is the average outdoor ambient temperature at each home for the
most recent year, measure in degree Fahrenheit.
Heating_Oil : This is the total number of units of heating oil purchased by the owner of
each home in the most recent year.
Num_Occupants : This is the total number of occupants living in each home.
Avg_Age : This is the average age of those occupants.
Home_Size : This is a rating, on a scale of one to eight, of the home's overall size. The
higher the number, the larger the home.
DATA PREPARATION
A CSV data set for this chapter's example is available for download at the topic's companion web
site ( https://sites.google.com/site/dataminingforthemasses/ ) . If you wish to follow along with
the example, go ahead and download the Chapter04DataSet.csv file now and save it into your
RapidMiner data folder. Then, complete the following steps to prepare the data set for correlation
mining:
1) Import the Chapter 4 CSV data set into your RapidMiner data repository. Save it with the
name Chapter4. If you need a refresher on how to bring this data set into your
RapidMiner repository, refer to steps 7 through 14 of the Hands On Exercise in Chapter 3.
The steps will be the same, with the exception of which file you select to import. Import
all attributes, and accept the default data types. When you are finished, your repository
should look similar to Figure 4-1.
 
 
Search WWH ::




Custom Search