Database Reference
In-Depth Information
participate in the programs she will offer. She also understands that there are probably policy
holders with high weight and low cholesterol, those with high weight and high cholesterol, and
those with low weight and high cholesterol. She further recognizes there are likely to be a lot of
people somewhere in between. In order to accomplish her goal, she needs to search among the
thousands of policy holders to find groups of people with similar characteristics and craft
programs and communications that will be relevant and appealing to people in these different
groups.
DATA UNDERSTANDING
Using the insurance company's claims database, Sonia extracts three attributes for 547 randomly
selected individuals. The three attributes are the insured's weight in pounds as recorded on the
person's most recent medical examination, their last cholesterol level determined by blood work in
their doctor's lab, and their gender. As is typical in many data sets, the gender attribute uses 0 to
indicate Female and 1 to indicate Male. We will use this sample data from Sonia's employer's
database to build a cluster model to help Sonia understand how her company's clients, the health
insurance policy holders, appear to group together on the basis of their weights, genders and
cholesterol levels. We should remember as we do this that means are particularly susceptible to
undue influence by extreme outliers, so watching for inconsistent data when using the k-Means
clustering data mining methodology is very important.
DATA PREPARATION
As with previous chapters, a data set has been prepared for this chapter's example, and is available
as Chapter06DataSet.csv on the topic's companion web site. If you would like to follow along
with this example exercise, go ahead and download the data set now, and import it into your
RapidMiner data repository. At this point you are probably getting comfortable with importing
CSV data sets into a RapidMiner repository, but remember that the steps are outlined in Chapter 3
if you need to review them. Be sure to designate the attribute names correctly and to check your
data types as you import. Once you have imported the data set, drag it into a new, blank process
window so that you can begin to set up your k-means clustering data mining model. Your process
should look like Figure 6-1.
 
 
Search WWH ::




Custom Search