Database Reference
In-Depth Information
CHARM tries to extend it by another item Q from the same set and applies three
conditions for pruning. If the newly create itemset by extension is frequent, CHARM
performs closure-checking to identify whether the itemset is closed. CHARM also
updates the set I accordingly. In other words, it replaces P with P
Q , if the
corresponding pruning condition is met. If the set I is the not empty, then CHARM
is called recursively.
5.3.3
CLOSET and CLOSET+
CLOSET [ 53 ] and CLOSET+ [ 69 ] frequent closed itemset mining algorithms are
inspired by the FP-growth method. The CLOSET algorithm makes use of the prin-
ciples of the FP-Tree data structure to avoid the candidate generation step during
the process of mining frequent closed itemsets. This work introduces a technique,
referred to as single prefix path compression, that quickly assists the mining process.
CLOSET also applies partition-based projection mechanisms for better scalability.
The mining procedure of CLOSET follows the FP-growth algorithm. However, the al-
gorithm is able to extract only the closed patterns by careful book-keeping. CLOSET
treats items appearing in every transaction of the conditional database specially. For
example, if Q is the set of items that appear in every transaction of the P conditional
database then P
Q creates a frequent closed itemset if it is not a proper subset of
any frequent closed itemset with the equal support. CLOSET also prunes the search
space. For example, if P and Q are frequent itemset with the equal support where Q
is also a closed itemset and P Q , then it does not mine the conditional database
of P because the latter will not produce any frequent closed itemsets.
CLOSET+ is a follow-up work after CLOSET by the same group of authors.
CLOSET+ attempts to design the most optimized frequent closed itemset mining
algorithm by finding the best trade-off between depth-first search versus breadth-
first search, vertical formats versus horizontal formats, tree structure versus other
data structures, top-down versus bottom-up traversal, and pseudo projection ver-
sus physical projection of the conditional database. CLOSET+ keeps track of the
unpromising prefix itemsets for generating potential closed frequent itemsets and
prunes the search space by deleting them. CLOSET+ also applies “item merging,”
and “sub-itemset” based pruning. To save the memory of the closure checking opera-
tion, CLOSET+ uses the combination of the 2-level hash-indexed tree based method
and the pseudo-projection based upward checking method. Interested readers are
encouraged to refer to [ 69 ] for more details.
5.3.4
DCI_CLOSED
DCI_CLOSED [ 41 , 42 ] uses a bitwise vertical representation of the input database.
DCI_CLOSED can be executed independently on each partition of the database in
any order and, thus, also in parallel. DCI_CLOSED is designed to improve memory-
efficiency by avoiding the storage of duplicate closed itemsets. DCI_CLOSED
designs a novel strategy for searching the lattice that can detect and discard du-
plicate closed patterns on the fly. Using the concept of order-preserving generators
Search WWH ::




Custom Search