Correlation - Data Mining for the Masses

Database Reference

In-Depth Information

DATA UNDERSTANDING

In order to investigate her question, Sarah has enlisted our help in creating a correlation matrix of

six attributes. Working together, using Sarah's employer's data resources which are primarily

drawn from the company's billing database, we create a data set comprised of the following

attributes:



Insulation : This is a density rating, ranging from one to ten, indicating the thickness of

each home's insulation. A home with a density rating of one is poorly insulated, while a

home with a density of ten has excellent insulation.



Temperature : This is the average outdoor ambient temperature at each home for the

most recent year, measure in degree Fahrenheit.



Heating_Oil : This is the total number of units of heating oil purchased by the owner of

each home in the most recent year.



Num_Occupants : This is the total number of occupants living in each home.



Avg_Age : This is the average age of those occupants.



Home_Size : This is a rating, on a scale of one to eight, of the home's overall size. The

higher the number, the larger the home.

DATA PREPARATION

A CSV data set for this chapter's example is available for download at the topic's companion web

site ( https://sites.google.com/site/dataminingforthemasses/ ) . If you wish to follow along with

the example, go ahead and download the Chapter04DataSet.csv file now and save it into your

RapidMiner data folder. Then, complete the following steps to prepare the data set for correlation

mining:

1) Import the Chapter 4 CSV data set into your RapidMiner data repository. Save it with the

name Chapter4. If you need a refresher on how to bring this data set into your

RapidMiner repository, refer to steps 7 through 14 of the Hands On Exercise in Chapter 3.

The steps will be the same, with the exception of which file you select to import. Import

all attributes, and accept the default data types. When you are finished, your repository

should look similar to Figure 4-1.

Search WWH ::

Custom Search

Home