Databases Reference
In-Depth Information
customers' buying habits by searching for itemsets that are frequently purchased
together (or in sequence).
Association rule mining consists of first finding frequent itemsets (sets of items,
such as A and B , satisfying a minimum support threshold , or percentage of the task-
relevant tuples), from which strong association rules in the form of A ) B are
generated. These rules also satisfy a minimum confidence threshold (a prespecified
probability of satisfying B under the condition that A is satisfied). Associations can be
further analyzed to uncover correlation rules , which convey statistical correlations
between itemsets A and B .
Many efficient and scalable algorithms have been developed for frequent itemset
mining , from which association and correlation rules can be derived. These algo-
rithms can be classified into three categories: (1) Apriori-like algorithms , (2) frequent
pattern growth - based algorithms such as FP-growth, and (3) algorithms that use the
vertical data format .
The Apriori algorithm is a seminal algorithm for mining frequent itemsets for
Boolean association rules. It explores the level-wise mining Apriori property that all
nonempty subsets of a frequent itemset must also be frequent. At the k th iteration (for
k 2), it forms frequent k -itemset candidates based on the frequent
-itemsets,
and scans the database once to find the complete set of frequent k -itemsets, L k .
Variations involving hashing and transaction reduction can be used to make the
procedure more efficient. Other variations include partitioning the data (mining on
each partition and then combining the results) and sampling the data (mining on
a data subset). These variations can reduce the number of data scans required to as
little as two or even one.
Frequent pattern growth is a method of mining frequent itemsets without candidate
generation. It constructs a highly compact data structure (an FP-tree ) to compress the
original transaction database. Rather than employing the generate-and-test strategy of
Apriori-like methods, it focuses on frequent pattern (fragment) growth, which avoids
costly candidate generation, resulting in greater efficiency.
Mining frequent itemsets using the vertical data format (Eclat) is a method that
transforms a given data set of transactions in the horizontal data format of TID-
itemset into the vertical data format of item-TID set . It mines the transformed
data set by TID set intersections based on the Apriori property and additional
optimization techniques such as diffset .
Not all strong association rules are interesting. Therefore, the support-confidence
framework should be augmented with a pattern evaluation measure, which promotes
the mining of interesting rules. A measure is null-invariant if its value is free from
the influence of null-transactions (i.e., the transactions that do not contain any of
the itemsets being examined ). Among many pattern evaluation measures, we exam-
ined lift ,
.
k 1
/
2 , all confidence, max confidence, Kulczynski , and cosine , and showed
 
Search WWH ::




Custom Search