Databases Reference
In-Depth Information
four such measures: all confidence , max confidence , Kulczynski , and cosine . We'll then
compare their effectiveness with respect to one another and with respect to the lift and
2
measures.
Given two itemsets, A and B , the all confidence measure of A and B is defined as
sup
.
A [ B
/
all conf
.
A , B
/D
/g D min f P
.
A j B
/
, P
.
B j A
/g,
(6.9)
max f sup
.
A
/
, sup
.
B
where max f sup ( A ), sup ( B )g is the maximum support of the itemsets A and B . Thus,
all conf
is also the minimum confidence of the two association rules related to
A and B , namely, “ A ) B ” and “ B ) A .”
Given two itemsets, A and B , the max confidence measure of A and B is defined as
.
A , B
/
max conf
.
A , B
/D max f P
.
A j B
/
, P
.
B j A
/g.
(6.10)
The max conf measure is the maximum confidence of the two association rules,
A ) B ” and “ B ) A .”
Given two itemsets, A and B , the Kulczynski measure of A and B (abbreviated as
Kulc ) is defined as
1
2 .
Kulc
.
A , B
/D
P
.
A j B
/C P
.
B j A
//
.
(6.11)
It was proposed in 1927 by Polish mathematician S. Kulczynski. It can be viewed as an
average of two confidence measures. That is, it is the average of two conditional prob-
abilities: the probability of itemset B given itemset A , and the probability of itemset A
given itemset B .
Finally, given two itemsets, A and B , the cosine measure of A and B is defined as
P
.
A [ B
/
sup
.
A [ B
/
cosine
.
A , B
/D
p P
D
p sup
.
A
/ P
.
B
/
.
A
/ sup
.
B
/
p P
D
.
A j B
/ P
.
B j A
/
.
(6.12)
The cosine measure can be viewed as a harmonized lift measure: The two formulae are
similar except that for cosine, the square root is taken on the product of the probabilities
of A and B . This is an important difference, however, because by taking the square root,
the cosine value is only influenced by the supports of A , B , and A [ B , and not by the
total number of transactions.
Each of these four measures defined has the following property: Its value is only
influenced by the supports of A , B , and A [ B , or more exactly, by the conditional prob-
abilities of P
, but not by the total number of transactions. Another
common property is that each measure ranges from 0 to 1, and the higher the value, the
closer the relationship between A and B .
Now, together with lift and
.
A j B
/
and P
.
B j A
/
2 , we have introduced in total six pattern evaluation
measures. You may wonder, “Which is the best in assessing the discovered pattern rela-
tionships?” To answer this question, we examine their performance on some typical
data sets.
 
Search WWH ::




Custom Search