Database Reference
In-Depth Information
setting is the discovery of maximal frequent itemsets, which can be obtained by
means of the following propagations:
1. T U is set to cover ( I L );
2. I U is set to those items in I U that are frequent in the database restricted to the
transactions in T U ;
3. T L is set to cover ( I U );
4. if some item not in I U
is frequent in the database restricted to the transactions in
T L , set stop to true.
5. if
θ , I L is set to I U .
The arguments for these steps are the following: the set T L represents those transac-
tions that will be covered by any itemset we will find in the future; if there is an item
that covers a sufficiently large number of these transactions, but we cannot add this
item in the current branch of the search tree, we stop traversing this branch in line 4,
as elsewhere we will find itemsets that include this item.
On the other hand, if the itemset consisting of all remaining items is frequent,
clearly this itemset must be maximal; we can directly include all items in the itemset.
This search strategy is embodied in the MaxMiner algorithm [ 1 ]. It was gener-
alized to the case of finding border representations under arbitrary monotonic and
anti-monotonic constraints by Bucila et al. [ 8 ].
T L |≥
|
Closed Frequent Itemsets Closed itemset mining can be achieved by another
modification of the propagation for frequent itemset mining:
1. T U is set to cover ( I L );
2. I U is set to those items in I U that are frequent in the database restricted to the
transactions in T U ;
3. let I contain those items in
which are present in all transactions in T U ;
4. if I contains items not in I U , set stop to true; otherwise, let I L be I .
Remember that in closed itemset mining the task is to find itemsets such that no
superset has the same coverage. This propagation ensures this: in line 4, if an item
can be added to the current itemset without changing the coverage, it will be added
immediately if this is allowed; however, if this item may not be added, as we branched
over it earlier, we stop the search, as we can no longer find closed itemsets in the
current part of the search space.
This search strategy is embodied in the LCM closed itemset mining algorithm
[ 30 ]. The combination of closed itemset mining with constraints was studied in
more detail in the D-MINER system by Besson et al. [ 2 , 3 ].
I
4.3
Generic Frameworks
The similarity between these depth-first search algorithms indicates that it may be
possible to combine different constraints and condensed representations. Indeed, this
is the key idea underlying most generic frameworks for constraint-based mining.
The DualMiner algorithm [ 8 ] essentially represents a generic depth-first algo-
rithm for finding border representations that extends the ideas found in the MaxMiner
Search WWH ::




Custom Search