Databases Reference
In-Depth Information
Similarly, in D 3 , the four new measures correctly show that m and c are strongly
negatively associated because
the m to c ratio equals the mc to m ratio, that is,
2 both contradict this in an incorrect way: Their
values for D 2 are between those for D 1 and D 3 .
For data set D 4 , both lift and
100
=
1100 D 9.1%. However, lift and
2 indicate a highly positive association between
m and c , whereas the others indic at e a “neutral” association because the ratio of mc to
mc equals the ratio of mc to mc , which is 1. This means that if a customer buys
coffee (or milk), the probability that he or she will also purchase milk (or coffee) is
exactly 50%.
2 so poor at distinguishing pattern association relationships in
the previous transactional data sets?” To answer this, we have to consider the null-
transactions . A null-transaction is a tr ans action that does not contain any of the item-
sets being examined. In our example, mc represents the number of null-transactions.
Lift and
“Why are lift and
2 have difficulty distinguishing interes ting pattern association relationships
because they are both strongly influenced by mc . Typically, the number of null-
transactions can outweigh the number of individual purchases because, for example,
many people may buy neither milk nor coffee. On the other hand, the other four
measures are good indicators of in teresting pattern associations because their defi-
nitions remove the influence of mc (i.e., they are not influenced by the number of
null-transactions).
This discussion shows that it is highly desirable to have a measure that has a value
that is independent of the number of null-transactions. A measure is null-invariant if
its value is free from the influence of null-transactions. Null-invariance is an impor-
tant property for measuring association patterns in large transaction databases. Among
the six discussed measures in this subsection, only lift and
2
are not null-invariant
measures.
Among the all confidence, max confidence, Kulczynski, and cosine measures, which
is best at indicating interesting pattern relationships ?”
To answer this question, we introduce the imbalance ratio ( IR ), which assesses the
imbalance of two itemsets, A and B , in rule implications. It is defined as
j sup
.
A
/ sup
.
B
/j
IR
.
A , B
/D
,
(6.13)
sup
.
A
/C sup
.
B
/ sup
.
A [ B
/
where the numerator is the absolute value of the difference between the support of the
itemsets A and B , and the denominator is the number of transactions containing A or
B . If the two directional implications between A and B are the same, then IR
will
be zero. Otherwise, the larger the difference between the two, the larger the imbalance
ratio. This ratio is independent of the number of null-transactions and independent of
the total number of transactions.
Let's continue examining the remaining data sets in Example 6.10.
.
A , B
/
Example 6.11 Comparing null-invariant measures in pattern evaluation. Although the four mea-
sures introduced in this section are null-invariant, they may present dramatically
 
Search WWH ::




Custom Search