Database Reference
In-Depth Information
4.1.1
Single-pass Filtering
The simplest filtering approach uses the following greedy search strategy:
1. Start with an 'empty' model M .
2. Start with an 'empty' model M .
3. Add patterns F
to M one by one. If the addition leads to better compression,
keep it, otherwise, permanently discard F .
F
Although the basic principle of this approach is very simple, note that there are
important details that need to be worked out depending on the specific model and
encoding. For example, it is often impossible to start with a model that is truly empty:
if a model does not contain any patterns at all, it may be impossible to encode the
data at hand and hence there is no compressed size to start from. Also, adding a
pattern to a model is not always straightforward: how and where in the model should
it be added? Depending on design choices, there may be many possibilities and if
these need all to be tested this can become a computational burden. Finally, in what
order should we consider the candidates in
? Since single-pass filtering considers
every candidate only once, this choice will have a large impact on the final result.
F
Example 8.12 KRIMP employs single-pass filtering with several heuristic choices
to ensure that it can induce good code tables from relatively large datasets and
candidate sets in reasonable time.
To ensure any transaction can be encoded, the induction process departs from the
code table containing all singleton itemsets, i.e.,
. Candidate itemsets are
considered in a fixed order, on frequencies and lengths, maximizing the probability
that we encounter candidates that aid compression. Finally, with the same goal,
we imposed an order on the itemsets in the code table. Together with the cover
function, which does not allow overlap, this means that each candidate itemset can
be efficiently evaluated. To further illustrate this example, the KRIMP algorithm is
given as Algorithm 5 .
{{ i }| i I }
 
Search WWH ::




Custom Search