Database Reference
In-Depth Information
target in the hopes of raising awareness, educating policy holders, and modifying behaviors that
will lead to lower incidence of heart disease among her employer's clients.
CHAPTER SUMMARY
k-Means clustering is a data mining model that falls primarily on the side of Classification when
referring to the Venn diagram from Chapter 1 (Figure 1-2). For this chapter's example, it does not
necessarily predict which insurance policy holders will or will not develop heart disease. It simply
takes known indicators from the attributes in a data set, and groups them together based on those
attributes' similarity to group averages. Because any attributes that can be quantified can also have
means calculated, k-means clustering provides an effective way of grouping observations together
based on what is typical or normal for that group. It also helps us understand where one group
begins and the other ends, or in other words, where the natural breaks occur between groups in a
data set.
k-Means clustering is very flexible in its ability to group observations together. The k-Means
operator in RapidMiner allows data miners to set the number of clusters they wish to generate, to
dictate the number of sample means used to determine the clusters, and to use a number of
different algorithms to evaluate means. While fairly simple in its set-up and definition, k-Means
clustering is a powerful method for finding natural groups of observations in a data set.
REVIEW QUESTIONS
1) What does the k in k-Means clustering stand for?
2) How are clusters identified? What process does RapidMiner use to define clusters and
place observations in a given cluster?
3) What does the Centroid Table tell the data miner? How do you interpret the values in a
Centroid Table?
4) How do descriptive statistics aid in the process of evaluating and deploying a k-Means
clustering model?
 
 
Search WWH ::




Custom Search