Information Technology Reference
In-Depth Information
be generated is typically restricted by minimal thresholds for the rule quality
measures support and confidence, minsupp and minconf respectively.
This restriction allows [3] to split the problem into two separate parts: An
itemset X is called frequent if supp( X ) minsupp. Once,
F = {X ⊆I| supp( X ) minsupp },
the set of all frequent itemsets together with the corresponding support values
is known, deriving the desired association rules is straight forward: For every
X ∈F one has to check the confidence of all rules
X \ Y → Y
with = Y X .
The so called downward closure property of itemset support ensures that
we actually can compute all necessary confidence values, c.f. [3]. This property
states that all subsets of a frequent itemset must also be frequent. So if X is
frequent we also know the support of all X \ Y with = Y X .
What remains to be done before rule generation is to determine F , the set of
all frequent itemsets. Unfortunately for obvious reason we are not able to look at
all subsets of I : A linearly growing n = |I| of course still implies an exponential
growing number of subsets to be taken into consideration.
4.2 The Generation of Frequent Itemsets
In the beginning of the mining run each itemset X ⊆I is potentially frequent.
In other words the initial search space consists of the power set of I without
the empty set, P ( I ) \∅ . Therefore even for rather small |I| the search space
easily exceeds all limits. For the set of items I = {a, b, c, d, e} this search space
is shown in Figure 2.
In order to avoid traversing the whole search space modern association min-
ing algorithms employ a candidate generation and test approach. The idea is to
generate an easy to survey set of potential frequent itemsets, a set of so called
candidates. Then the support values of these candidates are determined based
on the database D . Candidate generation always considers all information on fre-
quency and infrequency of already investigated candidates. In brief the common
strategy is as follows: from the downward closure property of itemset support we
knowthat all subsets of a frequent itemset must also be frequent. This allows us
to immediately prune those candidates as infrequent from the search space that
have at least one infrequent subset. After candidate generation the designated
candidates are counted based on the database and the algorithm proceeds to the
next iteration. The whole process stops as soon as there are no more potentially
frequent itemsets that have not been considered as candidates.
In Figure 2 the thick border separates the frequent itemsets in the upper
part from the infrequent itemsets in the lower part for a hypothetical support
threshold minsupp.Theexistenceofsuchaborderisguaranteedbythedownward
closure property of itemset support. Clearly, the proposed stepwise traversal of
the search space should start at the top with all or part of the frequent 1-itemsets
Search WWH ::




Custom Search