Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

four such measures: all confidence , max confidence , Kulczynski , and cosine . We'll then

compare their effectiveness with respect to one another and with respect to the lift and

measures.

Given two itemsets, A and B , the all confidence measure of A and B is defined as

sup

A [ B

all conf

A , B

/g D min f P

A j B

, P

B j A

/g,

(6.9)

max f sup

, sup

where max f sup ( A ), sup ( B )g is the maximum support of the itemsets A and B . Thus,

all conf

is also the minimum confidence of the two association rules related to

A and B , namely, “ A ) B ” and “ B ) A .”

Given two itemsets, A and B , the max confidence measure of A and B is defined as

A , B

max conf

A , B

/D max f P

A j B

, P

B j A

/g.

(6.10)

The max conf measure is the maximum confidence of the two association rules,

“ A ) B ” and “ B ) A .”

Given two itemsets, A and B , the Kulczynski measure of A and B (abbreviated as

Kulc ) is defined as

2 .

Kulc

A , B

A j B

/C P

B j A

(6.11)

It was proposed in 1927 by Polish mathematician S. Kulczynski. It can be viewed as an

average of two confidence measures. That is, it is the average of two conditional prob-

abilities: the probability of itemset B given itemset A , and the probability of itemset A

given itemset B .

Finally, given two itemsets, A and B , the cosine measure of A and B is defined as

A [ B

sup

A [ B

cosine

A , B

p P

p sup

/ P

/ sup

p P

A j B

/ P

B j A

(6.12)

The cosine measure can be viewed as a harmonized lift measure: The two formulae are

similar except that for cosine, the square root is taken on the product of the probabilities

of A and B . This is an important difference, however, because by taking the square root,

the cosine value is only influenced by the supports of A , B , and A [ B , and not by the

total number of transactions.

Each of these four measures defined has the following property: Its value is only

influenced by the supports of A , B , and A [ B , or more exactly, by the conditional prob-

abilities of P

, but not by the total number of transactions. Another

common property is that each measure ranges from 0 to 1, and the higher the value, the

closer the relationship between A and B .

Now, together with lift and

A j B

and P

B j A

2 , we have introduced in total six pattern evaluation

measures. You may wonder, “Which is the best in assessing the discovered pattern rela-

tionships?” To answer this question, we examine their performance on some typical

data sets.

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home