Database Reference
In-Depth Information
including the middle 50% of the entire population. Overlaid on the background
boxplots are the boxplots for the selected clusters. Point markers indicate the
corresponding medians and the horizontal spans the interquartile ranges. As
shown in the figure, cluster 3 is characterized by relatively high values on principal
component 1 and lower values on component 2. Thus, if we recall the interpretation
of the components presented earlier, cluster 3 mostly includes heavy SMS users
with low voice traffic. On the contrary, cluster 2 presents relatively higher values
on factor 2 and hence increased voice usage.
Another useful Cluster Viewer graph is shown in Figure 3.17. It examines the
distribution of the component 3 scores (roaming usage) for cluster 4. The darker
curve represents the cluster 4 distribution and is overlaid on the curve of the entire
customer base. The cluster 4 curve is on the right tail of the overall population
curve designating increased roaming usage for those customers.
ADDITIONAL PROFILING SUGGESTIONS
Even if the clustering solution has been built on principal component scores/factors,
it is always a good idea to go back to the original inputs and summarize the clusters in
terms of the original data. Usually, the component scores can give an overview and
a useful first insight into the meaning of the clusters; however, examining clusters
with respect to the original attributes can provide a more direct interpretation
that can be more easily communicated. Original fields can be standardized before
profiling for easier and more effective comparison, especially if they are measured
on different scales and have different variability.
Table 3.13 refers to the six clusters of our simple telecommunications case
study and summarizes the averages of important original attributes over the records
of each cluster. The results of this table reinforce the ones indicated by the Cluster
Viewer graphs.
The profiling process can also be enriched by and facilitated with charts like
those suggested in the following sections. For instance, a nice visual exploration
of the cluster solution, applicable to continuous profiling attributes, is provided
by plotting the percentage deviation of each cluster from the overall mean values.
The respective set of plots for the six clusters of the telecommunications example
is presented in Figure 3.18.
Another useful profiling set of charts and statistical tests are offered by
the SPSS AIM command. The mean values of the selected continuous profiling
attributes are compared to the overall population values (based on t -tests) and
statistically significant differences are detected, enabling analysts to focus on the
most differentiating fields. A sample output from the SPSS AIM command is
shown in Figure 3.19, presenting the results for cluster 3 in our example. The
Search WWH ::




Custom Search