Database Reference
In-Depth Information
Fig. 2.1 A generic frequent pattern mining algorithm
framework are provided in a different chapter. Surveys on frequent pattern mining
may be found in [ 26 , 33 ].
One of the main reasons for the high level of interest in frequent pattern mining
algorithms is due to the computational challenge of the task. Even for a moderate
sized dataset, the search space of FPM is enormous, which is exponential to the
length of the transactions in the dataset. This naturally creates challenges for itemset
generation, when the support levels are low. In fact, in most practical scenarios, the
support levels at which one can mine the corresponding itemsets are limited (bounded
below) by the memory and computational constraints. Therefore, it is critical to be
able to perform the analysis in a space- and time-efficient way. During the first few
years of research in this area, the primary focus of work was to find FPM algorithms
with better computational efficiency.
Several classes of algorithms have been developed for frequent pattern mining,
many of which are closely related to one another. In fact, the execution tree of all the
algorithms is mostly different in terms of the order in which the patterns are explored,
and whether the counting work done for different candidates is independent of one
another. To explain this point, we introduce a primitive “baseline” algorithm that
forms the heart of most frequent pattern mining algorithms.
Figure 2.1 presents the pseudocode for a very simple “baseline” frequent pattern
mining algorithm. The algorithm takes the transaction database
and a user-defined
support value s as input. It first populates all length-one frequent patterns in a frequent
pattern data-store,
T
. Then it generates a candidate pattern and computes its support
in the database. If the support of the candidate pattern is equal or higher than the
minimum support threshold the pattern is stored in
FP
FP
. The process continues until
all the frequent patterns from the database are found.
In the aforementioned algorithm, candidate patterns are generated from the previ-
ously generated frequent patterns. Then, the transaction database is used to determine
which of the candidates are truly frequent patterns. The key issues of computa-
tional efficiency arise in terms of generating the candidate patterns in an orderly and
carefully designed fashion, pruning irrelevant and duplicate candidates, and using
well chosen tricks to minimize the work in counting the candidates. Clearly, the
Search WWH ::




Custom Search