Database Reference
In-Depth Information
is characterized by its centroid or cluster center. This is a virtual ''prototype
record,'' the vector of the means of all clustering inputs. It is the central point of
the cluster. Cluster centers can be considered as themost typical member cases of
each cluster and therefore are usually examined and explored for cluster profiling.
• Apart from simple tables and charts of descriptive statistics, revealed clusters
can also be profiled through the use of classification modeling techniques which
examine the association of the clusters with the fields of interest. Decision trees,
due to their transparency and the intuitive form of their results, are commonly
applied to explore the structure of the clusters.
Table 3.15 summarizes and compares the characteristics of the clustering
techniques presented in the previous sections, namely the K-means, TwoStep, and
Kohonen network techniques.
Table 3.15 Comparative table of clustering techniques.
K-means
TwoStep
Kohonen
network/SOM
Methodology
description
Iterative proce-
dure based
on a selected
(typically
Euclidean)
distance
measure
In the first phase records,
through a single data
pass, are grouped into
pre-clusters. In the sec-
ond phase pre-clusters
are further grouped
into the final clusters
through hierarchical
clustering
Based on neural
networks. Clus-
ters are spatially
arranged in a grid
map with dis-
tances indicating
their similarities
Handling of
categorical
clustering
fields
Yes, through
a recoding
into indicator
fields
Yes
Yes, through a
recoding into
indicator fields
Number of
clusters
Analysts specify
in advance
the number
of clusters to
fit; therefore
it requires
multiple runs
and tests
The number of clusters
is automatically deter-
mined according to
specific criteria
Analysts specify the
(maximum) num-
ber of output
neurons. These
neurons indi-
cate the probable
clusters
( continued overleaf )
Search WWH ::




Custom Search