Database Reference
In-Depth Information
would be to use another clustering model, save the resulting cluster centers, and
load them as initial seeds for the K-means training.
On the other hand, K-means advantages include its speed and scalability: it
is one of the fastest clustering models and it can efficiently handle long and wide
datasets with many records and many input clustering fields.
Recommended K-means Options
Figures 3.6 and 3.7 and Table 3.9 present the recommended options for the IBM
SPSS Modeler K-means modeling node. As outlined earlier, a key characteristic of
Table 3.9 IBM SPSS Modeler recommended K-means options.
Option
Setting
Functionality/reasoning for selection
Number of
clusters
Requires trial
and error.
Analysts
should evalu-
ate different
clustering
solutions
Analysts should experiment with different num-
bers of clusters to find the solution that best fits
their specific business goals
Worth trying alternatives: users can get an
indication of the underlying number of clusters
by trying other clustering techniques which
incorporate specific criteria and automatically
detect the number of clusters
Generate dis-
tance field
Selected
It generates an additional field which denotes
the distance of each record from the center of
the assigned cluster. It can be used to assess
whether a case is a typical representative of
its cluster or lies far apart from the rest of the
other members
Show cluster
proximity
Selected
It creates a proximity matrix of the distances
between the cluster centers. It can be used to
assess the separation of the revealed clusters
Maximum
iterations
and change
tolerance
Keep the
defaults,
unless
The algorithm stops if, after a data pass (iteration),
the cluster centers do not change (0 change
tolerance) or after the specified number of
iterations has been completed, regardless of
the change of the cluster centers
Users should browse the results and examine the
solution steps and the change of the cluster
centers at each iteration. In the case of non-
convergence after the specified number of
iterations, they should increase the specified
number of iterations and rerun the algorithm
...
Search WWH ::




Custom Search