Database Reference
In-Depth Information
CHARM
tries to extend it by another item
Q
from the same set and applies three
conditions for pruning. If the newly create itemset by extension is frequent,
CHARM
performs closure-checking to identify whether the itemset is closed.
CHARM
also
updates the set
I
accordingly. In other words, it replaces
P
with
P
Q
, if the
corresponding pruning condition is met. If the set
I
is the not empty, then
CHARM
is called recursively.
∪
5.3.3
CLOSET and CLOSET+
CLOSET
[
53
] and
CLOSET+
[
69
] frequent closed itemset mining algorithms are
inspired by the
FP-growth
method. The
CLOSET
algorithm makes use of the prin-
ciples of the FP-Tree data structure to avoid the candidate generation step during
the process of mining frequent closed itemsets. This work introduces a technique,
referred to as single prefix path compression, that quickly assists the mining process.
CLOSET
also applies partition-based projection mechanisms for better scalability.
The mining procedure of
CLOSET
follows the
FP-growth
algorithm. However, the al-
gorithm is able to extract only the closed patterns by careful book-keeping.
CLOSET
treats items appearing in every transaction of the conditional database specially. For
example, if
Q
is the set of items that appear in every transaction of the
P
conditional
database then
P
Q
creates a frequent closed itemset if it is not a proper subset of
any frequent closed itemset with the equal support.
CLOSET
also prunes the search
space. For example, if
P
and
Q
are frequent itemset with the equal support where
Q
is also a closed itemset and
P
⊂
Q
, then it does not mine the conditional database
of
P
because the latter will not produce any frequent closed itemsets.
CLOSET+
is a follow-up work after
CLOSET
by the same group of authors.
CLOSET+
attempts to design the most optimized frequent closed itemset mining
algorithm by finding the best trade-off between depth-first search versus breadth-
first search, vertical formats versus horizontal formats, tree structure versus other
data structures, top-down versus bottom-up traversal, and pseudo projection ver-
sus physical projection of the conditional database.
CLOSET+
keeps track of the
unpromising prefix itemsets for generating potential closed frequent itemsets and
prunes the search space by deleting them.
CLOSET+
also applies “item merging,”
and “sub-itemset” based pruning. To save the memory of the closure checking opera-
tion,
CLOSET+
uses the combination of the 2-level hash-indexed tree based method
and the pseudo-projection based upward checking method. Interested readers are
encouraged to refer to [
69
] for more details.
∪
5.3.4
DCI_CLOSED
DCI_CLOSED
[
41
,
42
] uses a bitwise vertical representation of the input database.
DCI_CLOSED
can be executed independently on each partition of the database in
any order and, thus, also in parallel.
DCI_CLOSED
is designed to improve memory-
efficiency by avoiding the storage of duplicate closed itemsets.
DCI_CLOSED
designs a novel strategy for searching the lattice that can detect and discard du-
plicate closed patterns on the fly. Using the concept of order-preserving generators