Two-way tables: biplots associated with correspondence analysis - Understanding Biplots

Information Technology Reference

In-Depth Information

where

n =

x ij ,

and, using the weighted identity on the expression in square brackets of (7.23), we have

x . j

= n

x i . x i . ( x ij / x i . − x i j / x i . )

x . j ( x ij / x i . − x i j / x i . )

= n

x i . x i .

(7.24)

j = 1

The expression in the square brackets on the right-hand side of (7.24) is the chi-squared

distance (7.12) between the i th and i th rows of X . Thus we have the simple result that

x i . x i . d ii =

2 1 RDR1 ,

= n

(7.25)

where D ={ d ii }

is the p × p matrix of all the row chi-squared distances (7.12). Similarly,

for the column chi-squared distances, we have

x . j x . j d jj =

2 1 CDC1 ,

= n

j < j

d jj }

where now D

q matrix of all the column chi-squared distances (7.18).

These results link the chi-squared distances to the total Pearson's χ

is the q

for X .

7.2.5 Canonical correlation approximation

Probably the oldest derivation of CA is due to Hirschfeld (1935) who asked what quan-

tification of the categorical levels of the two variables classifying the contingency table,

maximized their correlation. To express this idea algebraically, we define two indicator

matrices, G 1 and G 2 , of sizes n × p and n × q respectively, identifying row and column

membership of the n cases.

In terms of our previous notation, we have

X = G 1 G 2 ,

R = G 1 G 1 ,

C = G 2 G 2 .

Next, we define quantification vectors z 1 : p × 1and z 2 : q × 1, to be determined, which

transform the categorical variables into quantitative variables G 1 z 1 and G 2 z 2 .Thesetwo

variables have squared (uncentred) correlations ρ

given by

( z 1 G 1 G 1 z 1 )( z 2 G 2 G 2 z 2 ) =

( z 1 G 1 G 2 z 2 )

( z 1 Rz 1 )( z 2 Cz 2 ) .

( z 1 Xz 2 )

(7.26)

Understanding Biplots

Search WWH ::

Custom Search

Home