increase revenues and market share. Refer to [Dragoon 2005] for
more details about customer segmentation.
Solution Approach: Find Clusters of Similar Customers
Using ABCBank's customer data such as profiles, products owned,
and product usage, a clustering model can be built that identifies
customer segments. Each cluster in the model represents a customer
segment, that is, customers with similar characteristics. By under-
standing the characteristics of the customers in each segment,
ABCBank can gain greater insight into product design and achieve
more focused campaigns.
Data Specification and Settings
In this example, we use the CUSTOMERS dataset discussed in
Section 7.1 for finding the natural groupings of the customers based
on customer attribute values. Attributes used for segmentation may
vary from those used for classification. For example, you may omit
the target attribute attrite or add customer product purchase indica-
tors. Section 4.6 introduced the concepts of clustering and clusters,
and Section 12.3 will include a more detailed discussion on customer
Clustering techniques vary in their approach to find clusters, for
example, partitioning-based, hierarchical, density-based, and grid-based
algorithms. For more details on these techniques, refer to [Han/
Kamber 2006]. JDM defines a clustering mining function and one of
the popular partitioning-based clustering algorithms called k -means.
Partitioning based algorithms, such as k -means, typically require
users to specify the desired number of partitions, or clusters, k . The
algorithm then finds the clusters that have high intra-cluster similarity
but low inter-cluster similarity. The k -means algorithm randomly
selects k cases to serve as the seeds for the clusters. It then measures the
distance from each case to each cluster's centroid and assigns the case
to the “nearest” cluster. New cluster centroids are computed based on
all the cases assigned to each centroid, and the process repeats.
To illustrate clustering concepts, we take a dataset with ten cus-
tomer cases of two attributes, age and income , listed in Table 7-11(a).
In clustering, one of the challenges is how to measure similarity
between cases. For example, numerical attributes may be in different
scales and categorical attributes may have discrete values, perhaps