Clustering and Visualization of Retail Market Baskets - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

3. Clustering and Visualization of

Retail Market Baskets

Joydeep Ghosh and Alexander Strehl

The University of Texas at Austin, Austin TX 78705, USA;

email: ghosh@ece.utexas.edu , alexander@strehl.com

Transaction analysis, including clustering of market baskets, is a key ap-

plication of data mining to the retail industry. This domain has some spe-

cific requirements, such as the need for obtaining easily interpretable and

actionable results. It also exhibits some very challenging characteristics,

mostly stemming from the fact that the data have thousands of features and

are highly non-Gaussian and sparse. This chapter proposes a relationship-

based approach to clustering such data that tries to sidestep the “curse-

of-dimensionality” issue by working in a suitable similarity space instead

of the original high-dimensional feature space. This intermediary similarity

space can be suitably tailored to satisfy business criteria such as requiring

customer clusters to represent comparable amounts of revenue. We apply

e cient and scalable graph-partitioning-based clustering techniques in this

space. The output from the clustering algorithm is used to reorder the data

points so that the resulting permuted similarity matrix can be readily visual-

ized in two dimensions, with clusters showing up as bands. The visualization

is very helpful for assessing and improving clustering. For example, action-

able recommendations for splitting or merging clusters can be easily derived,

and it also guides the user toward a suitable number of clusters. Results are

presented on a real retail industry data set of several thousand customers

and products.

3.1 Introduction

Knowledge discovery in databases often requires clustering the data into a

number of distinct segments or groups in an effective and e cient manner.

Good clusters show high similarity within a group and low similarity between

any two different groups. Grouping customers based on buying behavior pro-

vides useful marketing decision support knowledge, especially in e-business

applications where electronically observed behavioral data are readily avail-

able. Customer clusters can be used to identify up-selling and cross-selling

opportunities with existing customers [3.1]. One can also cluster products

that tend to sell together. Clustering of transactional data has widespread

applications in the retail industry. This chapter focuses on this important ap-

plication domain for data mining, first highlighting its unique requirements

and challenges and then proposing customized methods for clustering and

visualization of large-scale transactional data.

Search WWH ::

Custom Search

Home