Databases Reference
In-Depth Information
four such measures:
all confidence
,
max confidence
,
Kulczynski
, and
cosine
. We'll then
compare their effectiveness with respect to one another and with respect to the
lift
and
2
measures.
Given two itemsets,
A
and
B
, the
all confidence
measure of
A
and
B
is defined as
sup
.
A
[
B
/
all conf
.
A
,
B
/D
/g
D
min
f
P
.
A
j
B
/
,
P
.
B
j
A
/g,
(6.9)
max
f
sup
.
A
/
,
sup
.
B
where
max
f
sup
(
A
),
sup
(
B
)g is the maximum support of the itemsets
A
and
B
. Thus,
all conf
is also the minimum confidence of the two association rules related to
A
and
B
, namely, “
A
)
B
” and “
B
)
A
.”
Given two itemsets,
A
and
B
, the
max confidence
measure of
A
and
B
is defined as
.
A
,
B
/
max conf
.
A
,
B
/D
max
f
P
.
A
j
B
/
,
P
.
B
j
A
/g.
(6.10)
The
max conf
measure is the maximum confidence of the two association rules,
“
A
)
B
” and “
B
)
A
.”
Given two itemsets,
A
and
B
, the
Kulczynski
measure of
A
and
B
(abbreviated as
Kulc
) is defined as
1
2
.
Kulc
.
A
,
B
/D
P
.
A
j
B
/C
P
.
B
j
A
//
.
(6.11)
It was proposed in 1927 by Polish mathematician S. Kulczynski. It can be viewed as an
average of two confidence measures. That is, it is the average of two conditional prob-
abilities: the probability of itemset
B
given itemset
A
, and the probability of itemset
A
given itemset
B
.
Finally, given two itemsets,
A
and
B
, the
cosine
measure of
A
and
B
is defined as
P
.
A
[
B
/
sup
.
A
[
B
/
cosine
.
A
,
B
/D
p
P
D
p
sup
.
A
/
P
.
B
/
.
A
/
sup
.
B
/
p
P
D
.
A
j
B
/
P
.
B
j
A
/
.
(6.12)
The
cosine
measure can be viewed as a
harmonized lift
measure: The two formulae are
similar except that for cosine, the
square root
is taken on the product of the probabilities
of
A
and
B
. This is an important difference, however, because by taking the square root,
the cosine value is only influenced by the supports of
A
,
B
, and
A
[
B
, and not by the
total number of transactions.
Each of these four measures defined has the following property: Its value is only
influenced by the supports of
A
,
B
, and
A
[
B
, or more exactly, by the conditional prob-
abilities of
P
, but not by the total number of transactions. Another
common property is that each measure ranges from 0 to 1, and the higher the value, the
closer the relationship between
A
and
B
.
Now, together with
lift
and
.
A
j
B
/
and
P
.
B
j
A
/
2
, we have introduced in total six pattern evaluation
measures. You may wonder,
“Which is the best in assessing the discovered pattern rela-
tionships?”
To answer this question, we examine their performance on some typical
data sets.