Database Reference
In-Depth Information
itemsets, but the test condition is more complex. For an itemset I , its will be dropped
from the time window [0 : m ]if
1. The support for I within each individual time window between m and n must be
below θ .
2. The support for I within each subsequence of time windows between m and n
must be below .
3.3
Closed and Maximal Itemsets
An alternative to mining for all frequent itemsets (FI) is to look for closed frequent
itemsets (CFI) or maximal frequent itemsets (MFI). A frequent itemset I is closed if
every proper superset of I has a lesser support; that is, s ( I ) <s ( I ), for all I
I .
A frequent itemset I is maximal if every proper superset of I is not frequent, or
s ( I ) , for all I
I . MFIs sit at the boundary between frequent and infrequent
itemsets. They are usually at least an order of magnitude fewer in number than FIs,
so they offer substantial memory savings. However, one cannot recover the set of FIs
from the set of MFIs, so there is a loss of information. CFIs, on the hand, are more
numerous. While they do not offer as great a reduction in memory consumption, the
set of CFIs and their supports completely specify the set of FIs and their supports.
Thus, computing and storing only CFIs can represent a substantial memory and
computational savings for frequent itemset mining.
Closed Frequent Itemsets Chi et al. produced MOMENT, the first algorithm for
mining closed frequent itemsets from a data stream, choosing to use a sliding data
window [ 23 ]. They utilize a closed enumerated tree (CET) with lexicographically
ordered items, recording both CFIs and those itemsets at the boundary between
CFIs and other itemsets. This boundary could change with each new transaction.
In practice, however, itemsets tend to change status slowly, so the tree's structure
does not change often. Figure 9.1 shows the CET that corresponds to the following
data with θ
( A , B , C ). To
maintain the necessary information, the tree's nodes are classified into four different
categories.
=
0 . 5: T 1
=
( C , D ), T 2
=
( A , B ), T 3
=
( A , B , C ), T 4
=
￿
Infrequent gateway node (solid circle): I is infrequent AND I is the union of I 's
frequent parent and another frequent itemset.
￿
Unpromising gateway node (dotted rectangle): I is frequent AND I has a
descendant which is a CFI with the same support as I .
￿
Intermediate node (no border): I is frequent AND has a child with the same
support AND I is not a unpromising gateway node.
￿
Closed node (solid rectangle): I is a closed frequent itemset.
By maintaining counts for these four types of itemsets, MOMENT is able to track the
exact set of CFIs.
Search WWH ::




Custom Search