Database Reference
In-Depth Information
decide on the number of clusters to retain. This algorithm cannot effectively
handle more than a few thousand cases. Thus it cannot be directly applied in
most business clustering tasks. A usual workaround is to a use it on a sample of
the clustering population. However, with numerous other efficient algorithms
that can easily handle millions of records, clustering through sampling is not
considered an ideal approach.
K-means: This is an efficient and perhaps the fastest clustering algorithm that
can handle both long (many records) and wide datasets (many data dimensions
and input fields). It is a distance-based clustering technique and, unlike the
hierarchical algorithm, it does not need to calculate the distances between all
pairs of records. The number of clusters to be formed is predetermined and
specified by the user in advance. Usually a number of different solutions should
be tried and evaluated before approving the most appropriate. It is best for
handling continuous clustering fields.
TwoStep cluster: As its name implies, this scalable and efficient clustering
model, included in IBM
Modeler (formerly Clementine), processes
records in two steps. The first step of pre-clustering makes a single pass through
the data and assigns records to a limited set of initial subclusters. In the second
step, initial subclusters are further grouped, through hierarchical clustering, into
the final segments. It suggests a clustering solution by automatic clustering: the
optimal number of clusters can be automatically determined by the algorithm
according to specific criteria.
Kohonen network/Self-Organizing Map (SOM): Kohonen networks are
based on neural networks and typically produce a two-dimensional grid or map
of the clusters, hence the name self-organizing maps. Kohonen networks usually
take a longer time to train than the K-means and TwoStep algorithms, but they
provide a different view on clustering that is worth trying.
SPSS
Apart from segmentation, clustering techniques can also be used for other
purposes, for example, as a preparatory step for optimizing the results of predictive
models. Homogeneous customer groups can be revealed by clustering and then
separate, more targeted predictive models can be built within each cluster.
Alternatively, the derived cluster membership field can also be included in the list of
predictors in a supervisedmodel. Since the cluster field combines information from
many other fields, it often has significant predictive power. Another application
of clustering is in the identification of unusual records. Small or outlier clusters
could contain records with increased significance that are worth closer inspection.
Similarly, records far apart from the majority of the cluster members might also
indicate anomalous cases that require special attention.
The clustering techniques are further explained and presented in detail in the
next chapter.
Search WWH ::




Custom Search