An Overview of Data Mining Techniques - Data Mining Techniques in CRM: Inside Customer Segmentation

Database Reference

In-Depth Information

Figure 2.11 Graphical representation of clustering.

records with similar input data patterns, hence similar behavioral profiles, to the

same cluster.

Nowadays, various clustering algorithms are available, which differ in their

approach for assessing the similarity of records and in the criteria they use

to determine the final number of clusters. The whole clustering ''revolution''

started with a simple and intuitive distance measure, still used by some clustering

algorithms today, called the Euclidean distance. The Euclidean distance of two

records or objects is a dissimilarity measure calculated as the square root of the sum

of the squared differences between the values of the examined attributes/fields. In

our example the Euclidean distance between customers 1 and 6 would be:

√ [(Customer 1 voice usage

Customer 6 voice usage) 2

−

Customer 6 SMS usage) 2 ]

+

(Customer 1 SMS usage

−

=

24

This value denotes the disparity of customers 1 and 6 and is represented in

the respective scatterplot by the length of the straight line that connects points

1 and 6. The Euclidean distances for all pairs of customers are summarized in

Table 2.7.

A traditional clustering algorithm, named agglomerative or hierarchical clus-

tering, works by evaluating the Euclidean distances between all pairs of records,

literally the length of their connecting lines, and begins to group them accordingly

Search WWH ::

Custom Search

Home