Database Reference
In-Depth Information
Propagation may signal that no solution can be found in the current branch of the
search tree by setting stop to true. If the lower- and upper-bound for the itemset are
identical, a pattern has been found and is added to the output. Otherwise, in line 6 an
item is selected, which is recursively added to the itemset (line 7), or removed from
consideration (line 8).
Note that the same itemset can never be found twice: an item which is added in
line 7, will never be added to an itemset that is considered in the search tree explored
in the call of line 8.
We will now consider how this algorithm can be instantiated for different types of
mining settings.
Frequent Itemset Mining This is the most simple setting. Essentially, in this case,
the following propagation steps are executed:
1. T U is restricted to cover ( I L );
2. I U is restricted to those items in I U that are frequent in the database containing
only the transactions of T U (in other words, the items that are frequent in the
projected database for itemset I L , see Chap. 3);
T L and I L are not modified by propagation.
For these choices, the search is highly similar to that of Eclat or FP-Growth; at
every point in the search tree, we maintain a list of candidate items that can be added
to the current itemset; the set of candidate items is reduced based on the minimum
support threshold.
Attentive readers may have noticed that the search tree for the generic algorithm
presented here is binary, whereas for most itemset mining algorithms the tree is not
binary. This is, however, only a minor conceptual difference: the recursive calls in
line 8 of our generic algorithm essentially correspond to a traversal of the candidate
list in traditional frequent itemset mining algorithms, where we remember which
items we may no longer consider.
The clear benefit of the non-traditional perspective is that other constraints can be
added with minor effort.
Search WWH ::




Custom Search