Database Reference
In-Depth Information
Table 2.1: Notation table.
Notation Description
I =fi 1 ; i 2 ; : : : ; i M g Universe of literals (items) with cardinality M
jRj Cardinality of a set R
I; J; : : : Itemsets produced from I
k-itemset An itemset of length (equiv. size) k
T n = (tid; I) A transaction with unique identifier tid and itemset I
D =fT 1 ; T 2 ; : : : ; T N g A database consisting of N transactions
T nm
The m-th item of the n-th transaction in a database
I ) J
An association rule between itemsets I and J
D I
The supporting transactions of itemset I in database D
Ã( )
The powerset of a set of literals
P =Ã(I)
The set of all possible itemsets (patterns) extracted from I
sup(I; D), sup(I)
Support of itemset I in database D
freq(I; D), freq(I)
Frequency of itemset I in database D
msup; mfreq; mconf
Minimum support/frequency/confidence threshold
F D
Set of all frequent itemsets in database D
Bd + (F D );Bd +
Positive border of F D
Bd (F D );Bd
Negative border of F D
Bd(F D )
Border of F D
S
Set of sensitive itemsets (patterns)
R
Set of mined association rules
R S
Set of sensitive association rules from R
Association rule mining, introduced by Agrawal, et al. [5, 7] in 1993, operates
by first mining all the itemsets that are frequent in the database and then by using
these itemsets to derive association rules that are strong enough to be considered as
interesting. The process of frequent itemset mining is defined as follows 1 : Let I =
fi 1 ; i 2 ; : : : ; i M gbe a finite set of literals, called items, where M denotes the cardinality
of the set. Any subset I I is called an itemset over I. A k-itemset is an itemset of
length (equiv. of size) k, i.e. an itemset consisting of k items. A transaction T n over
I is a pair T n = (tid; I), where I is the itemset and tid is a unique identifier, used to
distinguish among transactions that correspond to the same itemset. A transactional
database D =fT 1 ; T 2 ; : : : ; T N g over I is a N M table consisting of N transactions
over I carrying different identifiers, where entry T nm = 1 if and only if the m-th
item (m 2 [1; M]) appears in the n-th transaction (n 2 [1; N]). Otherwise, T nm = 0. A
transaction T = (tid; J) is said to support an itemset I over I, if I J. Let D I denote
the supporting transactions of itemset I in database D. Furthermore, let S be a set
of items. NotationÃ(S) denotes the powerset of S, which is the set of all subsets of
S. Given the universe of all items I in D, we use notation P =Ã(I) to refer to all
possible itemsets that can be produced from I.
Given an itemset I over I in D, sup(I; D) denotes the number of transactions
T 2 D that support I and freq(I; D) denotes the fraction of transactions in D that
1 The reader can refer to the work of Agrawal, et al. [5, 7] for a more detailed presentation of the
association rule mining framework as well as for a set of computationally efficient algorithms for
the derivation of the association rules. Moreover, [68] provides very good coverage of this topic.
 
Search WWH ::




Custom Search