An Overview of Data Mining Techniques - Data Mining Techniques in CRM: Inside Customer Segmentation

Database Reference

In-Depth Information

decide on the number of clusters to retain. This algorithm cannot effectively

handle more than a few thousand cases. Thus it cannot be directly applied in

most business clustering tasks. A usual workaround is to a use it on a sample of

the clustering population. However, with numerous other efficient algorithms

that can easily handle millions of records, clustering through sampling is not

considered an ideal approach.

• K-means: This is an efficient and perhaps the fastest clustering algorithm that

can handle both long (many records) and wide datasets (many data dimensions

and input fields). It is a distance-based clustering technique and, unlike the

hierarchical algorithm, it does not need to calculate the distances between all

pairs of records. The number of clusters to be formed is predetermined and

specified by the user in advance. Usually a number of different solutions should

be tried and evaluated before approving the most appropriate. It is best for

handling continuous clustering fields.

• TwoStep cluster: As its name implies, this scalable and efficient clustering

model, included in IBM

Modeler (formerly Clementine), processes

records in two steps. The first step of pre-clustering makes a single pass through

the data and assigns records to a limited set of initial subclusters. In the second

step, initial subclusters are further grouped, through hierarchical clustering, into

the final segments. It suggests a clustering solution by automatic clustering: the

optimal number of clusters can be automatically determined by the algorithm

according to specific criteria.

• Kohonen network/Self-Organizing Map (SOM): Kohonen networks are

based on neural networks and typically produce a two-dimensional grid or map

of the clusters, hence the name self-organizing maps. Kohonen networks usually

take a longer time to train than the K-means and TwoStep algorithms, but they

provide a different view on clustering that is worth trying.

SPSS

Apart from segmentation, clustering techniques can also be used for other

purposes, for example, as a preparatory step for optimizing the results of predictive

models. Homogeneous customer groups can be revealed by clustering and then

separate, more targeted predictive models can be built within each cluster.

Alternatively, the derived cluster membership field can also be included in the list of

predictors in a supervisedmodel. Since the cluster field combines information from

many other fields, it often has significant predictive power. Another application

of clustering is in the identification of unusual records. Small or outlier clusters

could contain records with increased significance that are worth closer inspection.

Similarly, records far apart from the majority of the cluster members might also

indicate anomalous cases that require special attention.

The clustering techniques are further explained and presented in detail in the

next chapter.

Search WWH ::

Custom Search

Home