Database Reference
In-Depth Information
Minimum Sum of Cost and Minimum Support A first approach for dealing with
a monotonic minimum-sum-of-cost constraint is to add the following propagation
[ 26 ]:
3 if the sum of costs of itemset I U
is lower than the desired threshold, set stop to
true.
The rationale for this propagation step is that we can stop a branch of the search if
the most expensive itemset we can still reach is not expensive enough.
The benefit of this approach is that this propagation step is relatively easy to
calculate, while for high thresholds it will already prune effectively.
A more elaborate approach was proposed by Bonchi et al. [ 5 , 6 , 7 ]. Its essential
observation is that if an itemset needs to be both frequent and expensive, this means
that a certain number of transactions in the data needs to be expensive as well. This
leads to the following propagation steps:
1. T U
cover ( I L );
2. I U is set to those items in I U that are frequent in the database restricted to the
transactions in T U ;
3. from T U all transactions d are removed for which c ( d I U ) , where θ is the
cost threshold;
4. if T U
is set to T U
was changed, go back to step 2;
I L , set stop to true.
The interesting idea in this approach is the presence of a feedback loop: the removal
of items can make some transactions too cheap; when a transaction is too cheap, it
will not be in the cover of an itemset, and we can remove it from consideration; this
however will reduce the support of items further, potentially making them infrequent
in the projected database.
The advantage of this approach is that it can reduce the size of the search tree even
further. The disadvantage is that the propagation is more complex too calculate, as
it involves a traversal of the data. To remedy this, Bonchi et al. [ 6 ] studied settings
in which the above loop is not executed in all nodes of the seach tree.
Minimum and Maximum Support A similar idea can be used when we have a
minimum support threshold on some transactions (
5. if I U
D + ), and a maximum support
D )[ 10 , 18 ].
threshold on the other transactions (
1. T U is set to cover ( I L );
2. I U is set to those items in I U that are frequent in the database restricted to the
transactions in T U D + ;
3. T L is set to cover ( I U );
4. if
T L D |
, where θ is the maximum support threshold for the negative
examples, then set stop to true.
|
In this approach, the main idea is that if the lowest support that can be reached on
the negative transactions is not low enough, we can stop the search.
Maximal Frequent Itemsets Of particular interest in the constraint-based mining
literature is the discovery of border (or boundary) representations. The simplest such
Search WWH ::




Custom Search