Databases Reference
In-Depth Information
Similarly, in
D
3
, the four new measures correctly show that
m
and
c
are strongly
negatively associated because
the
m
to
c
ratio equals the
mc
to
m
ratio, that is,
2
both contradict this in an incorrect way: Their
values for
D
2
are between those for
D
1
and
D
3
.
For data set
D
4
, both
lift
and
100
=
1100 D 9.1%. However,
lift
and
2
indicate a highly positive association between
m
and
c
, whereas the others indic
at
e a “neutral” association because the ratio of
mc
to
mc
equals the ratio of
mc
to
mc
, which is 1. This means that if a customer buys
coffee (or milk), the probability that he or she will also purchase milk (or coffee) is
exactly 50%.
2
so poor at distinguishing pattern association relationships in
the previous transactional data sets?”
To answer this, we have to consider the
null-
transactions
. A
null-transaction
is a tr
ans
action that does not contain any of the item-
sets being examined. In our example,
mc
represents the number of null-transactions.
Lift
and
“Why are
lift
and
2
have difficulty distinguishing interes
ting
pattern association relationships
because they are both strongly influenced by
mc
. Typically, the number of null-
transactions can outweigh the number of individual purchases because, for example,
many people may buy neither milk nor coffee. On the other hand, the other four
measures are good indicators of
in
teresting pattern associations because their defi-
nitions remove the influence of
mc
(i.e., they are not influenced by the number of
null-transactions).
This discussion shows that it is highly desirable to have a measure that has a value
that is independent of the number of null-transactions. A measure is
null-invariant
if
its value is free from the influence of null-transactions. Null-invariance is an impor-
tant property for measuring association patterns in large transaction databases. Among
the six discussed measures in this subsection, only
lift
and
2
are not null-invariant
measures.
“
Among the
all confidence, max confidence, Kulczynski,
and
cosine
measures, which
is best at indicating interesting pattern relationships
?”
To answer this question, we introduce the
imbalance ratio
(
IR
), which assesses the
imbalance of two itemsets,
A
and
B
, in rule implications. It is defined as
j
sup
.
A
/
sup
.
B
/j
IR
.
A
,
B
/D
,
(6.13)
sup
.
A
/C
sup
.
B
/
sup
.
A
[
B
/
where the numerator is the absolute value of the difference between the support of the
itemsets
A
and
B
, and the denominator is the number of transactions containing
A
or
B
. If the two directional implications between
A
and
B
are the same, then
IR
will
be zero. Otherwise, the larger the difference between the two, the larger the imbalance
ratio. This ratio is independent of the number of null-transactions and independent of
the total number of transactions.
Let's continue examining the remaining data sets in Example 6.10.
.
A
,
B
/
Example 6.11
Comparing null-invariant measures in pattern evaluation.
Although the four mea-
sures introduced in this section are null-invariant, they may present dramatically