Information Technology Reference
In-Depth Information
Ta b l e 7 . 3 0
Quality of CA biplot display of concatenated 2001/02 and 2007/08 crime
data.
Dim1 Dim2 Dim3 Dim4 Dim5 Dim6 Dim7 Dim8 Dim9 Dim10 Dim11 Dim12
Quality
48.6
84.7
91.4
94.7
96.8
98.0
98.9
99.4
99.7
99.9
99.9
100.0
Ta b l e 7 . 3 1 Column predictivities of the CA biplot display of concatenated 2001/02
and 2007/08 crime data.
Arsn AGBH AtMr BNRs BRs
CrJk CmAs CmRb DrgR InAs
Mrd PubV Rape RAC
Dim_1
0.146
0.116
0.175 0.015 0.038 0.123 0.042
0.494 0.990 0.640 0.024 0.232 0.113 0.060
Dim_2
0.271
0.784
0.183 0.455 0.047 0.949 0.282
0.729 0.998 0.644 0.031 0.423 0.350 0.982
Dim_3
0.661
0.947
0.270 0.545 0.081 0.952 0.877
0.767 0.999 0.684 0.343 0.423 0.611 0.987
Dim_4
0.661
0.987
0.369 0.581 0.730 0.970 0.948
0.802 1.000 0.699 0.585 0.435 0.629 0.997
Dim_5
0.693
0.993
0.888 0.608 0.956 0.971 0.979
0.812 1.000 0.701 0.799 0.517 0.632 0.997
Dim_6
0.723
0.995
0.889 0.977 0.965 0.972 0.994
0.818 1.000 0.753 0.898 0.533 0.724 0.997
Dim_7
0.879
1.000
0.979 0.978 0.965 0.976 0.999
0.852 1.000 0.757 0.972 0.663 0.968 0.997
Dim_8
0.886
1.000
0.991 0.979 1.000 0.983 1.000
0.969 1.000 0.821 0.987 0.684 0.968 0.997
Dim_9
0.890
1.000
0.992 0.989 1.000 0.994 1.000
0.997 1.000 0.891 0.989 0.783 0.975 1.000
Dim_10 0.953
1.000
0.995 0.995 1.000 0.998 1.000
0.999 1.000 0.991 0.994 0.820 0.977 1.000
Dim_11 0.983
1.000
1.000 0.999 1.000 1.000 1.000
1.000 1.000 0.992 0.998 0.863 0.995 1.000
Dim_12 0.987
1.000
1.000 0.999 1.000 1.000 1.000
1.000 1.000 0.992 0.998 1.000 1.000 1.000
Dim_13 1.000
1.000
1.000 1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000 1.000 1.000 1.000
the change ca.variant="Chisq2Cols" in the above calls. Both figures show two
clusters of attributes, ( 1 , 2 , 5 )and( 3 , 4 , 6 ), indicating similar column profiles within these
clusters. Thus, the doubling process has preserved this information, but Figure 7.34
clearly shows the bipolar structure of the variables. Recall that the points representing
the positive and negative forms of an attribute are at the extremities of a line through the
mean which divides its two ends in the ratio c k : Np
c k . It can be seen that this ratio is
close to 0.5 for all attributes, so there is little polarization with this data set. Had some of
the values been closer to zero or unity, then the column chi-squared analysis before and
after doubling would have been more different. Note also that the plus pole is closer to the
point of intersection of the lines in the case of attributes 1 and 5 , whereas the min pole is
closer to the intersection point in all other cases. This shows that if votes are aggregated
for the four products only attributes 1 and 5 receive a majority of positive votes.
Doubling is an ingenious idea but a final comment might be that it underlines that
chi-squared distance is an inadequate measure when absolute values rather than ratios
are important.
7.7 Conclusion
The previous section exemplifies the versatility of the R code. In practice the main uses
of correspondence analysis are concerned with approximating chi-squared distance and,
to a lesser extent, with approximating the Pearson residuals. We recommend the latter but
admit that in not being overenthusiastic about chi-squared distance we are in a minority.
Search WWH ::




Custom Search