Database Reference
In-Depth Information
Table 2.7 The proximity matrix of Euclidean distances between all pairs of customers.
Euclidean distance
1
2
3
4
5
6
1
0.0
100.1
114.9
157.3
144.0
24.0
2
100.1
0.0
16.6
95.8
76.0
77.9
3
114.9
16.6
0.0
84.5
64.4
93.4
4
157.3
95.8
84.5
0.0
20.1
145.0
5
144.0
76.0
64.4
20.1
0.0
129.7
6
24.0
77.9
93.4
145.0
129.7
0.0
in successive steps. Although many things have changed in clustering algorithms
since the inception of this algorithm, it is nice to have a graphical representation
of what clustering is all about. Nowadays, in an effort to handle large volumes of
data, algorithms use more efficient distance measures and approaches which do
not require the calculation of the distances between all pairs of records. Even a
specific type of neural network is applied for clustering; however, the main concept
is always the same - the grouping of homogeneous records. Typical clustering
tasks involve the mining of thousands of records and tens or hundreds of attributes.
Things are much more complicated than in our simplified exercise. Tasks like this
are impossible to handle without the help of specialized algorithms that aim to
automatically uncover the underlying groups.
One thing that should be made crystal clear about clustering is that it
groups records according to the observed input data patterns . Thus, the data
miners and marketers involved should decide in advance, according to the specific
business objective, the segmentation level and the segmentation criteria - in other
words, the clustering fields. For example, if we want to segment bank customers
according to their product balances, we must prepare a modeling dataset with
balance information at a customer level. Even if our original input data are in a
transactional format or stored at a product account level, the selected segmentation
level requires a modeling dataset with a unique record per customer and with
fields that would summarize their product balances.
In general, clustering algorithms provide an exhaustive and mutual exclusive
solution. They automatically assign each record to one of the uncovered groups.
They produce disjoint clusters and generate a cluster membership field that
denotes the group of each record, as shown in Table 2.8.
In our illustrative exercise we have discovered the differentiating characteris-
tics of each cluster and labeled them accordingly. In practice, this process is not so
easy andmay involve many different attributes, even those not directly participating
Search WWH ::




Custom Search