Database Reference
In-Depth Information
Summary
Clustering analysis groups similar objects based on the objects' attributes.
Clustering is applied in areas such as marketing, economics, biology, and medicine.
This chapter presented a detailed explanation of the k-means algorithm and its
implementation in R. To use k-means properly, it is important to do the following:
• Properly scale the attribute values to prevent certain attributes from
dominating the other attributes.
• Ensure that the concept of distance between the assigned values within an
attribute is meaningful.
• Choose the number of clusters, k, such that the sum of the Within Sum of
Squares (WSS) of the distances is reasonably minimized. A plot such as the
example in Figure 4.5 can be helpful in this respect.
If k-means does not appear to be an appropriate clustering technique for a given
dataset, then alternative techniques such as k-modes or PAM should be considered.
Once the clusters are identified, it is often useful to label these clusters in some
descriptive way. Especially when dealing with upper management, these labels are
useful to easily communicate the findings of the clustering analysis. In clustering,
the labels are not preassigned to each object. The labels are subjectively assigned
after the clusters have been identified. Chapter 7 considers several methods to
perform the classification of objects with predetermined labels. Clustering can be
used with other analytical techniques, such as regression. Linear regression and
logistic regression are covered in Chapter 6, “Advanced Analytical Theory and
Methods: Regression.”
Search WWH ::




Custom Search