Database Reference
In-Depth Information
revealed by analyzing the observed input data patterns. Clustering techniques
assess the similarity of the records or customers with respect to the clustering fields
and assign them to the revealed clusters accordingly. The goal is to detect groups
with internal homogeneity and interclass heterogeneity.
Clustering techniques are quite popular and their use is widespread in data
mining and market research. They can support the development of different seg-
mentation schemes according to the clustering attributes used: namely, behavioral,
attitudinal, or demographic segmentation.
The major advantage of the clustering techniques is that they can efficiently
manage a large number of attributes and create data-driven segments. The created
segments are not based on a priori personal concepts, intuitions, and perceptions of
the business people. They are induced by the observed data patterns and, provided
they are built properly, they can lead to results with real business meaning and
value. Clustering models can analyze complex input data patterns and suggest
solutions that would not otherwise be apparent. They reveal customer typologies,
enabling tailored marketing strategies. In later chapters we will have the chance to
present real-world applications from major industries such as telecommunications
and banking, whichwill highlight the true benefits of datamining-derived clustering
solutions.
Unlike classification modeling, in clustering there is no predefined set of
classes. There are no predefined categories such as churners/non-churners or
buyers/non-buyers and there is also no historical dataset with pre-classified records.
It is up to the algorithm to uncover and define the classes and assign each record
to its ''nearest'' or, in other words, its most similar cluster. To present the basic
concepts of clustering, let us consider the hypothetical case of a mobile telephony
network operator that wants to segment its customers according to their voice and
SMS usage. The available demographic data are not used as clustering inputs in
this case since the objective concerns the grouping of customers according only to
behavioral criteria.
The input dataset, for a few imaginary customers, is presented in Table 2.6.
In the scatterplot in Figure 2.10, these customers are positioned in a two-
dimensional space according to their voice usage, along the X -axis, and their SMS
usage, along the Y -axis.
The clustering procedure is depicted in Figure 2.11, where voice and SMS
usage intensity are represented by the corresponding symbols.
Examination of the scatterplot reveals specific similarities among the cus-
tomers. Customers 1 and 6 appear close together and present heavy voice usage
and low SMS usage. They can be placed in a single group which we label as ''Heavy
voice users.'' Similarly, customers 2 and 3 also appear close together but far apart
from the rest. They form a group of their own, characterized by average voice and
SMS usage. Therefore one more cluster has been disclosed, which can be labeled
as ''Typical users.'' Finally, customers 4 and 5 also seem to be different from the
Search WWH ::




Custom Search