Database Reference
In-Depth Information
Table 6.1
Transactional
database-positive and
negative items
TID
Original TD
Augmented TD
1
A,C,D
A
,
¬
B
,
C
,
D
,
¬
E
2
B,C
¬
A
,
B
,
C
,
¬
D
,
¬
E
3
C
¬
A
,
¬
B
,
C
,
¬
D
,
¬
E
4
A,B,E
A
,
B
,
¬
C
,
¬
D
,
E
5
A,C,D
A
,
¬
B
,
C
,
D.
¬
E
Definition 1 (Association Rule)
An
association rule
is an implication of the form
“
X
⇒
⊆
I
⊆
I
∩
=∅
Y
”, where
X
,
Y
, and
X
Y
.
Definition 2
(Support)
The rule
X
⇒
Y
has a
support s
in the transaction set
D
if
s
% of the transactions in
Y
. In other words, the support of the rule is
the probability that
X
and
Y
hold together among all the possible presented cases.
D
contain
X
∪
Definition 3
(Confidence)
The rule
X
⇒
Y
holds in the transaction set
D
with
confidence c
if
c
% of transactions in
that contain
X
also contain
Y
. In other words,
the confidence of the rule is the conditional probability that the consequent
Y
is true
under the condition of the antecedent
X
.
The problem of discovering all association rules from a set of transactions
D
D
consists of generating the rules that have a
support
and
confidence
greater than given
thresholds.
Definition 4
(Negative Item and Positive Item)
A negative item is defined as
¬
i
k
, meaning that item
i
k
is absent from a transaction
T
. The support of
¬
i
k
is
s
(
s
(
i
k
).
i
k
, a positive item, is an item that is present in a transaction.
Definition 5 (Negative Association Rule)
A negative association rule is an impli-
cation of the form
X
¬
i
k
)
=
1
−
⇒
Y
, where
X
⊆
I
,
Y
⊆
I
, and
X
∩
Y
=∅
and
X
and/or
Y
contain at least one negative item.
Definition 6 Negative Associations between Itemsets
A negative association be-
tween two positive itemsets
X
,
Y
are rules of the following forms
¬
X
⇒
Y
,
X
⇒¬
Y
and
Y
.
Table
6.1
shows a toy transactional database with 5 transactions and 5 items. “Orig-
inal TD” column shows the items present in each transaction, while “Augmented TD”
column shows both present and absent items.
Mining association rules from a transactional database that contains information
about both present and absent items is computationally expensive due to the following
reasons:
¬
X
⇒¬
1. The number of items in the transactional database swells when their negative
counterparts are added to a transactional database. The maximum number of
patterns that can be found in a transactional database with
d
items is 2
d
−
1. The
number of items in the “Original TD” in Table
6.1
is
n
5. Even for the small set
in Table
6.1
, the number of itemsets jumps dramatically from 31 to 1023 when
the negative items are added.
=