Clustering and Visualization of Retail Market Baskets - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

first and fourth columns. In our visualization this corresponds to brighter

off-diagonal regions and darker block-diagonal regions in S compared to the

original S matrix. The proposed visualization technique is quite powerful

and versatile. In Figure 3.2(a) the chosen similarity behaves randomly. Con-

sequently, no strong visual difference between on- and off-diagonal regions can

be perceived with

in S . It indicates clustering is ineffective, which

is expected because there is no structure in the similarity matrix. Figure

3.2(b) is based on data consisting of pairwise almost equidistant singletons.

Clustering into two groups still renders the on-diagonal regions very bright,

suggesting more splits. In fact, this will remain unchanged until each data

point is a cluster by itself, thus revealing the singleton character of the data.

For monolithic data (Fig. 3.2(c)), many strong similarities are indicated by

an almost uniformly dark similarity matrix S . Splitting the data results in

dark off-diagonal regions in S . A dark off-diagonal region suggests that the

clusters in the corresponding rows and columns should be merged (or not be

split in the first place).

Clusion

indicates that these data are actually one

large cluster. In Fig. 3.2(d), the gray-level distribution of S exposes bright

and dark pixels, thereby recommending it should be split. In this case, k =2

apparently is a very good choice (and the clustering algorithm worked well)

because in S on-diagonal regions are uniformly dark and off-diagonal regions

are uniformly bright.

This induces an intuitive mining process that guides the user to the

“right” number of clusters. Too small a k leaves the on-diagonal regions inho-

mogeneous. On the contrary, growing k beyond the natural number of clusters

will introduce dark off-diagonal regions. Finally,

Clusion

can be used to vi-

sually compare the appropriateness of different similarity measures. Let us

assume, for example, that each row in Fig. 3.2 illustrates a particular way of

defining similarity for the same data set. Then

Clusion

makes visually ap-

parent that the similarity measure in (d) lends itself much better to clustering

than the measures illustrated in rows (a), (b), and (c).

An interactive tool that facilitates exploration of the merge/split process

can be experienced at http://lans.ece.utexas.edu/ ∼ strehl/ .

Clusion

3.4.3 Comparison

Clusion

gives a relationship-centered view, as contrasted with common pro-

jective techniques, such as the selection of dominant features or optimal linear

projections (PCA), which are object-centered.In

, the actual features

are transparent, instead, all pairwise relationships, the relevant aspect for the

purpose of clustering, are displayed.

Figure 3.3 compares

Clusion

with other popular visualizations. In Fig. 3.3(a)

parallel axis, PCA projection, CViz (projection through plane defined by cen-

troids of clusters 1, 2, and 3), as well as

Clusion

succeed in visualizing the

IRIS data. Membership in cluster 1/2/3 is indicated by colors red/blue/green

(parallel axis), colors red/blue/green and shapes

Clusion

◦

/

×

/+ (PCA and CViz),

Advanced Techniques in Knowledge Discovery and Data Mining

Search WWH ::

Custom Search

Home