Frequent Pattern Mining Algorithms: A Survey - Frequent Pattern Mining - page 19

Database Reference

In-Depth Information

Fig. 2.1 A generic frequent pattern mining algorithm

framework are provided in a different chapter. Surveys on frequent pattern mining

may be found in [ 26 , 33 ].

One of the main reasons for the high level of interest in frequent pattern mining

algorithms is due to the computational challenge of the task. Even for a moderate

sized dataset, the search space of FPM is enormous, which is exponential to the

length of the transactions in the dataset. This naturally creates challenges for itemset

generation, when the support levels are low. In fact, in most practical scenarios, the

support levels at which one can mine the corresponding itemsets are limited (bounded

below) by the memory and computational constraints. Therefore, it is critical to be

able to perform the analysis in a space- and time-efficient way. During the first few

years of research in this area, the primary focus of work was to find FPM algorithms

with better computational efficiency.

Several classes of algorithms have been developed for frequent pattern mining,

many of which are closely related to one another. In fact, the execution tree of all the

algorithms is mostly different in terms of the order in which the patterns are explored,

and whether the counting work done for different candidates is independent of one

another. To explain this point, we introduce a primitive “baseline” algorithm that

forms the heart of most frequent pattern mining algorithms.

Figure 2.1 presents the pseudocode for a very simple “baseline” frequent pattern

mining algorithm. The algorithm takes the transaction database

and a user-defined

support value s as input. It first populates all length-one frequent patterns in a frequent

pattern data-store,

T

. Then it generates a candidate pattern and computes its support

in the database. If the support of the candidate pattern is equal or higher than the

minimum support threshold the pattern is stored in

FP

FP

. The process continues until

all the frequent patterns from the database are found.

In the aforementioned algorithm, candidate patterns are generated from the previ-

ously generated frequent patterns. Then, the transaction database is used to determine

which of the candidates are truly frequent patterns. The key issues of computa-

tional efficiency arise in terms of generating the candidate patterns in an orderly and

carefully designed fashion, pruning irrelevant and duplicate candidates, and using

well chosen tricks to minimize the work in counting the candidates. Clearly, the

Next Page

Frequent Pattern Mining

Search WWH ::

Custom Search

Home