An Overview of Data Mining Techniques - Data Mining Techniques in CRM: Inside Customer Segmentation

Database Reference

In-Depth Information

Table 2.7 The proximity matrix of Euclidean distances between all pairs of customers.

Euclidean distance

1

2

3

4

5

6

1

0.0

100.1

114.9

157.3

144.0

24.0

2

100.1

0.0

16.6

95.8

76.0

77.9

3

114.9

16.6

0.0

84.5

64.4

93.4

4

157.3

95.8

84.5

0.0

20.1

145.0

5

144.0

76.0

64.4

20.1

0.0

129.7

6

24.0

77.9

93.4

145.0

129.7

0.0

in successive steps. Although many things have changed in clustering algorithms

since the inception of this algorithm, it is nice to have a graphical representation

of what clustering is all about. Nowadays, in an effort to handle large volumes of

data, algorithms use more efficient distance measures and approaches which do

not require the calculation of the distances between all pairs of records. Even a

specific type of neural network is applied for clustering; however, the main concept

is always the same - the grouping of homogeneous records. Typical clustering

tasks involve the mining of thousands of records and tens or hundreds of attributes.

Things are much more complicated than in our simplified exercise. Tasks like this

are impossible to handle without the help of specialized algorithms that aim to

automatically uncover the underlying groups.

One thing that should be made crystal clear about clustering is that it

groups records according to the observed input data patterns . Thus, the data

miners and marketers involved should decide in advance, according to the specific

business objective, the segmentation level and the segmentation criteria - in other

words, the clustering fields. For example, if we want to segment bank customers

according to their product balances, we must prepare a modeling dataset with

balance information at a customer level. Even if our original input data are in a

transactional format or stored at a product account level, the selected segmentation

level requires a modeling dataset with a unique record per customer and with

fields that would summarize their product balances.

In general, clustering algorithms provide an exhaustive and mutual exclusive

solution. They automatically assign each record to one of the uncovered groups.

They produce disjoint clusters and generate a cluster membership field that

denotes the group of each record, as shown in Table 2.8.

In our illustrative exercise we have discovered the differentiating characteris-

tics of each cluster and labeled them accordingly. In practice, this process is not so

easy andmay involve many different attributes, even those not directly participating

Data Mining Techniques in CRM: Inside Customer Segmentation

Search WWH ::

Custom Search

Home