On Pseudo-Statistical Independence in a Contingency Table - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

5 Statistical Independence in m × n Contingency Table

Let us consider a m

n contingency table shown in Table 2. Statistical inde-

pendence of R 1 and R 2 gives the following formulae:

×

P ([ R 1 = A i ,R 2 = B j ]) = P ([ R 1 = A i ]) P ([ R 2 = B j ])

( i =1 ,

···

,m,j =1 ,

···

,n ) .

According to the definition of the table,

N = k =1 x ik

l =1 x lj

N

x ij

×

.

(13)

N

Thus, we have obtained:

x ij = k =1 x ik × l =1 x lj

N

.

(14)

Thus, for a fixed j ,

= k =1 x i a k

x i a j

x i b j

k =1 x i b k

In the same way, for a fixed i ,

= l =1 x lj a

x ij a

x ij b

l =1 x lj b

Since this relation will hold for any j , the following equation is obtained:

= k =1 x i a k

x i a 1

x i b 1

= x i a 2

x i b 2 ··· = x i a n

k =1 x i b k .

(15)

x i b n

Since the right hand side of the above equation will be constant, thus all the

ratios are constant. Thus,

Theorem 4. If two attributes in a contingency table shown in Table 2 are

statistical indepedent, the following equations hold:

x i a 1

x i b 1

= x i a 2

= x i a n

x i b n

x i b 2 ···

= const.

(16)

for all rows: i a and i b (i a ,i b =1 , 2 ,

···

,m).

6 Contingency Matrix

The meaning of the above discussions will become much clearer when we view

a contingency table as a matrix.

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home