Databases Reference
In-Depth Information
is considered a negative (correlated) pattern. In car sales data, a dealer sells a few fuel-
thirsty vehicles (e.g., SUVs) to a given customer, and then later sells hybrid mini-cars to
the same customer. Even though buying SUVs and buying hybrid mini-cars may be neg-
atively correlated events, it can be interesting to discover and examine such exceptional
cases.
An infrequent (or rare ) pattern is a pattern with a frequency support that is below
(or far below ) a user-specified minimum support threshold. However, since the occur-
rence frequencies of the majority of itemsets are usually below or even far below the
minimum support threshold, it is desirable in practice for users to specify other con-
ditions for rare patterns. For example, if we want to find patterns containing at least
one item with a value that is over $500, we should specify such a constraint explic-
itly. Efficient mining of such itemsets is discussed under mining multidimensional
associations (Section 7.2.1), where the strategy is to adopt multiple (e.g., item- or
group-based) minimum support thresholds. Other applicable methods are discussed
under constraint-based pattern mining (Section 7.3), where user-specified constraints
are pushed deep into the iterative mining process.
There are various ways we could define a negative pattern. We will consider three
such definitions.
Definition 7.1: If itemsets X and Y are both frequent but rarely occur together (i.e.,
sup
), then itemsets X and Y are negatively correlated , and
the pattern X [ Y is a negatively correlated pattern . If sup
.
X [ Y
/<
sup
.
X
/ sup
.
Y
/
,
then X and Y are strongly negatively correlated , and the pattern X [ Y is a strongly
negatively correlated pattern .
.
X [ Y
/ sup
.
X
/ sup
.
Y
/
This definition can easily be extended for patterns containing k -itemsets for k
2.
A problem with the definition, however, is that it is not null-invariant . That is, its
value can be misleadingly influenced by null transactions, where a null-transaction is a
transaction that does not contain any of the itemsets being examined (Section 6.3.3).
This is illustrated in Example 7.4.
>
Example 7.4 Null-transaction problem with Definition 7.1. If there are a lot of null-transactions in
the data set, then the number of null-transactions rather than the patterns observed may
strongly influence a measure's assessment as to whether a pattern is negatively correlated.
For example, suppose a sewing store sells needle packages A and B . The store sold 100
packages each of A and B , but only one transaction contains both A and B . Intuitively,
A is negatively correlated with B since the purchase of one does not seem to encourage
the purchase of the other.
Let's see how the above Definition 7.1 handles this scenario. If there are 200
transactions, we have sup
.
A [ B
/D 1
=
200 D 0.005 and sup
.
A
/ sup
.
B
/D 100
=
200
100
, and so Definition 7.1 indi-
cates that A and B are strongly negatively correlated. What if, instead of only
200 transactions in the database, there are 10 6 ? In this case, there are many null-
transactions, that is, many contain neither A nor B . How does the definition hold up?
It computes sup
=
200 D 0.25. Thus, sup
.
A [ B
/ sup
.
A
/ sup
.
B
/
10 6
10 6 100
10 6 D 1
=
10 8 .
.
A [ B
/D 1
=
and sup
.
X
/ sup
.
Y
/D 100
=
=
 
Search WWH ::




Custom Search