Database Reference
In-Depth Information
3. Clustering and Visualization of
Retail Market Baskets
Joydeep Ghosh and Alexander Strehl
The University of Texas at Austin, Austin TX 78705, USA;
email: ghosh@ece.utexas.edu , alexander@strehl.com
Transaction analysis, including clustering of market baskets, is a key ap-
plication of data mining to the retail industry. This domain has some spe-
cific requirements, such as the need for obtaining easily interpretable and
actionable results. It also exhibits some very challenging characteristics,
mostly stemming from the fact that the data have thousands of features and
are highly non-Gaussian and sparse. This chapter proposes a relationship-
based approach to clustering such data that tries to sidestep the “curse-
of-dimensionality” issue by working in a suitable similarity space instead
of the original high-dimensional feature space. This intermediary similarity
space can be suitably tailored to satisfy business criteria such as requiring
customer clusters to represent comparable amounts of revenue. We apply
e cient and scalable graph-partitioning-based clustering techniques in this
space. The output from the clustering algorithm is used to reorder the data
points so that the resulting permuted similarity matrix can be readily visual-
ized in two dimensions, with clusters showing up as bands. The visualization
is very helpful for assessing and improving clustering. For example, action-
able recommendations for splitting or merging clusters can be easily derived,
and it also guides the user toward a suitable number of clusters. Results are
presented on a real retail industry data set of several thousand customers
and products.
3.1 Introduction
Knowledge discovery in databases often requires clustering the data into a
number of distinct segments or groups in an effective and e cient manner.
Good clusters show high similarity within a group and low similarity between
any two different groups. Grouping customers based on buying behavior pro-
vides useful marketing decision support knowledge, especially in e-business
applications where electronically observed behavioral data are readily avail-
able. Customer clusters can be used to identify up-selling and cross-selling
opportunities with existing customers [3.1]. One can also cluster products
that tend to sell together. Clustering of transactional data has widespread
applications in the retail industry. This chapter focuses on this important ap-
plication domain for data mining, first highlighting its unique requirements
and challenges and then proposing customized methods for clustering and
visualization of large-scale transactional data.
Search WWH ::




Custom Search