Frequent Pattern Mining Algorithms: A Survey - Frequent Pattern Mining

Database Reference

In-Depth Information

3.3.1

Eclat

Eclat uses a breadth-first approach like Savasere et al.'s algorithm [ 57 ] on lattice

partitions, after partitioning the candidate set into disjoint groups, using a candidate

partitioning approach similar to earlier parallel versions of the Apriori algorithm.

The Eclat [ 71 ] algorithm is best described with the concept of an enumeration tree

because of the wide variation in the different strategies used by the algorithm. An

important contribution of Eclat [ 71 ] is to recognize the earlier pioneering work of

the Monet and Partition algorithms [ 56 , 57 ] on recursive intersection of tid lists, and

propose many efficient variants of this paradigm.

Different variations of Eclat explore the candidates in different strategies. The

earliest description of Eclat may be found in [ 74 ]. A journal paper exploring differ-

ent aspects of Eclat may be found in [ 71 ]. In the earliest versions of the work [ 74 ], a

breadth-first strategy is used. The journal version in [ 71 ] also presents experimental

results for only the breadth-first strategy, although the possibility of a depth-first

strategy is mentioned in the paper. Therefore, the original Eclat algorithm should be

considered a breadth-first algorithm. More recent depth-first versions of Eclat , such

as dEclat , use recursive tidlist intersection with differencing [ 72 ], and realize the full

benefit of the depth-first approach. The Eclat algorithm, as presented in [ 74 ], uses a

levelwise strategy in which all ( k

1)-candidates within a lattice partition are gener-

ated from frequent k -patterns in level-wise fashion, as in Apriori . The tidlists are used

to perform support counting. The frequent patterns are determined from these tidlists .

At this point, a new levelwise phase is initiated for frequent patterns of size ( k +

1).

Other variations and depth-first exploration strategies of Eclat , along with exper-

imental results, are presented in later work such as dEclat [ 72 ]. The dEclat work in

[ 72 ] presents some additional enhancements such as diffsets to improve counting. In

this chapter, we present a simplified pseudo-code of this version of Eclat . The algo-

rithm is presented in Fig. 2.8 . The algorithm is structured as a recursive algorithm. A

pattern set

is part of the input, and is set to the set of all frequent 1-items at the

top level call. Therefore, it may be assumed that, at the top level, the set of frequent

1-items and tidlists have already been computed, though this computation is not

shown in the pseudocode. In each recursive call of Eclat , a new set of candidates

FP i is generated for every pattern (itemset) P i , which extends the itemset by one

unit. The support of a candidate is determined with the use of tidlist intersection.

Finally, if P i is frequent, it is added to a pattern set

FP i for the next level.

Figure 2.7 illustrates the itemset generation tree with support computation by

tidlist intersection for the sample database from Table 2.1 . The corresponding tidlists

in the tree are also illustrated. All infrequent itemsets in each level are denoted by dot-

ted, and bordered rectangles. For example, an itemset ab is generated by joining b to

a . The tidlist of ( a )is

. We can determine the

support of ab by intersecting the two tidlists to obtain the tidlist

{

1, 2, 3, 5

}

, and the tidlist of b is

{

1, 2, 4, 5

}

of these can-

didates. Therefore, the support of ab is given by the length of this tidlist , which is 3.

Further gains may be obtained with the use of the notion of diffsets [ 72 ]. This

approach realizes the true power of vertical pattern mining. The basic idea, in diffsets

is to maintain only the portion of the tidlists at a node, that correspond to the change in

the inverted list from the parent node. Thus, the tidlists at a node can be reconstructed

by examining the tidlists at the ancestors of a node in the tree. The major advantage

{

1, 2, 5

}

Frequent Pattern Mining

Search WWH ::

Custom Search

Home