Database Reference
In-Depth Information
and information on channel utilization, were not included in the model training
procedure since they would just confound the separation and lead the analytical
process away from the specific business goal. However, in the end all available
information of interest was taken into account during the cluster profiling and
evaluation phase.
Moreover, categorical fields were also omitted from the clustering procedure,
since, as mentioned in previous chapters, they tend to provide biased clustering
solutions which overlook differences attributable to other inputs.
THE ANALYTICAL PROCESS
The determined segmentation process comprised two steps. At first PCA was
applied to reveal the distinct data dimensions underlying the 41 inputs listed
above. Then a clustering model was used to reveal the final segmentation solution.
PCA, although optional, is a useful data preparation step aimed at data
reduction.
The extracted principal components, once explained and fully understood,
were used as clustering inputs instead of the original fields. This was the second
and final step of the analytical process: a clustering model assessed the similarities
of the records/customers in terms of the revealed components and suggested
the underlying customer groupings. The proposed clusters were then interpreted
and evaluated, mainly in terms of their business meaning and usefulness, before
concluding on the final solution adopted for the organization.
Identifying the Segmentation Dimensions with PCA/Factor Analysis
The team involved in the project selected PCA as the data reduction method. The
components extracted by PCA are uncorrelated linear combinations of the original
inputs. They are extracted in order of importance, with the first one carrying the
largest part of the variance of the original fields. The subsequent components
explain smaller portions of the total variance and are uncorrelated with each other.
Moreover, the analysts involved in the project also chose to incorporate a Varimax
rotation method in order to simplify interpretation of the components.
The PCA algorithm analyzed the inputs' intercorrelations and extracted 13
components which accounted for almost 85% of the variance/information of the
original fields - a large step toward simplicity with a minimum loss of information.
The amount of information retained by the extracted solution is summarized in
Table 6.15.
This table lists the eigenvalues and the percentage of variance (plain and
cumulative) explained by each extracted component. The criterion used to deter-
mine the number of components to extract was the eigenvalue (or latent root)
Search WWH ::




Custom Search