Database Reference
In-Depth Information
problem is known as the pattern explosion, and stems from the interplay of using a
monotonic interestingness measure, and asking for all frequent patterns.
We find many attempts in the literature aiming to solve the pattern explosion,
roughly divided between three main approaches. The first is to attempt to condense
the set of results. That is, we only want those patterns reported such that we can
infer (to certain extend) the complete set of frequent patterns. These interestingness
measures can hence also be regarded as extra constraints in addition to the frequency
constraint.
Maximal Frequent Itemsets In this vein, Bayardo proposed to mine maximal fre-
quent itemsets [ 6 ]: itemsets that are frequent and which cannot be extended without
their support dropping below the threshold. In the lattice, this comes down to re-
porting only the longest frequent pattern in each branch. In our example, the set of
maximal frequent itemsets for minsup
. Maximal frequent
itemsets are a lossy representation of the set of frequent itemsets in that all frequent
itemsets can be reconstructed, yet the individual frequencies are lost. While maximal
itemsets can be useful when we are interested in long patterns, we should be aware
that for very low support thresholds complete data records are returned—which beats
the purpose of pattern mining. Maximal frequent itemsets can be mined efficiently
using, e.g., the Max-Miner [ 6 ] and MAFIA [ 11 ] algorithms.
Closed Frequent Itemsets In contrast to maximal frequent itemsets, closed fre-
quent itemsets [ 52 ] provide a lossless representation of the frequent itemsets, as both
these itemsets and their frequencies can be reconstructed exactly. The definition of a
closed frequent itemset is an itemset X that is frequent, supp ( X )
=
2is
F max ={
abc , abd
}
minsup , and of
which there exists no extension for which the support remains the same, i.e., there is
no Y
supp ( X ). Following this definition, in our example,
the set of closed frequent itemsets consists of
X such that supp ( Y )
=
, which
is smaller than the complete set of frequent itemsets, yet larger than for maximal
itemsets. Efficient algorithms for mining closed frequent itemsets include Charm
[ 77 ].
Given a set of closed frequent itemsets
F closed
={
a , ab , abc , abd
}
F closed , we can determine the support of
any frequent itemset X
F
with ease. That is, for a given itemset X , we simply find
the smallest superset Y
F closed , with X
Y , and return the support of Y .Ifno
such superset exists in
F closed , X is not frequent. As such, we essentially derive the
frequency of X using a very simple rule.
Free Frequent Itemsets Closed itemsets can be seen as the maximal itemsets having
among the itemsets having the same support. Closely related are free sets [ 9 ], which
are the minimal itemsets among the itemsets having the same support, that is, an
itemset X is free if there is no Y
supp ( Y ). Each free
itemset X has a unique closure Y , a closed itemset Y such that X
X such that supp ( X )
=
Y . However, a
closed itemset may stem from many free itemsets. This means that free itemsets will
always be a larger collection than closed itemsets. Free itemsets are handy since they
form a monotonically downward closed collection, that is, all sub-itemsets of a free
Search WWH ::




Custom Search