Database Reference
In-Depth Information
This approach was adopted to eliminate the risk of deriving a biased solution
due to correlated original inputs. Moreover, it also ensures a balanced solution,
to which all data dimensions contribute equally, and it simplifies the tedious
procedure of understanding the clusters.
Specifically, a PCA model with Varimax rotation was applied to the original
segmentation fields. The solution finally selected after many trials included 12
extracted components and was based on the ''Eigenvalues over 1'' criterion. The
main reason for retaining this solution was the fact that it produced a relatively low
number of meaningful components without sacrificing much of the information in
the initial fields.
Before using the derived components and substituting more than 50 fields
for just a dozen ones, the data miners of the organization wanted to be sure that
the PCA solution carried over most of the original information. Therefore, they
started to examine the model results by looking at the table of ''variance explained,''
Table 7.6.
Table 7.6 Deciding the number of extracted components by
examining the variance explained.
Total variance explained
Components Eigenvalue
Percentage of Cumulative %
variance
1
14.85
27.50
27.50
2
7.59
14.06
41.55
3
4.36
8.07
49.63
4
3.94
7.30
56.92
5
2.27
4.21
61.13
6
1.87
3.45
64.58
7
1.82
3.37
67.96
8
1.71
3.18
71.14
9
1.56
2.89
74.03
10
1.42
2.63
76.66
11
1.34
2.48
79.14
12
1.10
2.04
81.17
13
0.98
1.81
82.98
14
0.93
1.72
84.73
15
0.86
1.60
86.33
16
0.77
1.42
87.75
17
0.72
1.33
89.08
.
.
.
.
54
0.00
0.00
100.00
Search WWH ::




Custom Search