Database Reference
In-Depth Information
refers to a collection of items or individual entities that contain some kind of
relationship. This could be a set of retail items purchased together in one
transaction, a set of hyperlinks clicked on by one user in a single session, or a set
of tasks done in one day. An itemset containing k items is called a k-itemset . This
chapter uses curly braces like {item 1 ,item 2 ,… item k } to denote a k-itemset.
Computation of the association rules is typically based on itemsets.
The research of association rules started as early as the 1960s. Early research by
Hájek et al. [1] introduced many of the key concepts and approaches of association
rule learning, but it focused on the mathematical representation rather than the
algorithm. The framework of association rule learning was brought into the
database community by Agrawal et al. [2] in the early 1990s for discovering
regularities between products in a large database of customer transactions
recorded by point-of-sale systems in supermarkets. In later years, it expanded to
web contexts, such as mining path traversal patterns [3] and usage patterns [4] to
facilitate organization of web pages.
This chapter chooses Apriori as the main focus of the discussion of association
rules. Apriori [5] is one of the earliest and the most fundamental algorithms
for generating association rules. It pioneered the use of support for pruning the
itemsets and controlling the exponential growth of candidate itemsets. Shorter
candidate itemsets, which are known to be frequent itemsets, are combined and
pruned to generate longer frequent itemsets. This approach eliminates the need for
all possible itemsets to be enumerated within the algorithm, since the number of
all possible itemsets can become exponentially large.
One major component of Apriori is support. Given an itemset L , the support
[2] of L is the percentage of transactions that contain L . For example, if 80% of
all transactions contain itemset {bread} , then the support of {bread} is 0.8.
Similarly, if 60% of all transactions contain itemset {bread,butter} , then the
support of {bread,butter} is 0.6.
A frequent itemset has items that appear together often enough. The term
“often enough” is formally defined with a minimum support criterion. If the
minimum support is set at 0.5, any itemset can be considered a frequent itemset if
at least 50% of the transactions contain this itemset. In other words, the support
of a frequent itemset should be greater than or equal to the minimum support.
For the previous example, both {bread} and {bread,butter} are considered
frequent itemsets at the minimum support 0.5. If the minimum support is 0.7, only
{bread} is considered a frequent itemset.
Search WWH ::




Custom Search