Database Reference
In-Depth Information
3.3.1
Eclat
Eclat uses a breadth-first approach like Savasere et al.'s algorithm [ 57 ] on lattice
partitions, after partitioning the candidate set into disjoint groups, using a candidate
partitioning approach similar to earlier parallel versions of the Apriori algorithm.
The Eclat [ 71 ] algorithm is best described with the concept of an enumeration tree
because of the wide variation in the different strategies used by the algorithm. An
important contribution of Eclat [ 71 ] is to recognize the earlier pioneering work of
the Monet and Partition algorithms [ 56 , 57 ] on recursive intersection of tid lists, and
propose many efficient variants of this paradigm.
Different variations of Eclat explore the candidates in different strategies. The
earliest description of Eclat may be found in [ 74 ]. A journal paper exploring differ-
ent aspects of Eclat may be found in [ 71 ]. In the earliest versions of the work [ 74 ], a
breadth-first strategy is used. The journal version in [ 71 ] also presents experimental
results for only the breadth-first strategy, although the possibility of a depth-first
strategy is mentioned in the paper. Therefore, the original Eclat algorithm should be
considered a breadth-first algorithm. More recent depth-first versions of Eclat , such
as dEclat , use recursive tidlist intersection with differencing [ 72 ], and realize the full
benefit of the depth-first approach. The Eclat algorithm, as presented in [ 74 ], uses a
levelwise strategy in which all ( k
+
1)-candidates within a lattice partition are gener-
ated from frequent k -patterns in level-wise fashion, as in Apriori . The tidlists are used
to perform support counting. The frequent patterns are determined from these tidlists .
At this point, a new levelwise phase is initiated for frequent patterns of size ( k +
1).
Other variations and depth-first exploration strategies of Eclat , along with exper-
imental results, are presented in later work such as dEclat [ 72 ]. The dEclat work in
[ 72 ] presents some additional enhancements such as diffsets to improve counting. In
this chapter, we present a simplified pseudo-code of this version of Eclat . The algo-
rithm is presented in Fig. 2.8 . The algorithm is structured as a recursive algorithm. A
pattern set
is part of the input, and is set to the set of all frequent 1-items at the
top level call. Therefore, it may be assumed that, at the top level, the set of frequent
1-items and tidlists have already been computed, though this computation is not
shown in the pseudocode. In each recursive call of Eclat , a new set of candidates
FP i is generated for every pattern (itemset) P i , which extends the itemset by one
unit. The support of a candidate is determined with the use of tidlist intersection.
Finally, if P i is frequent, it is added to a pattern set
FP
FP i for the next level.
Figure 2.7 illustrates the itemset generation tree with support computation by
tidlist intersection for the sample database from Table 2.1 . The corresponding tidlists
in the tree are also illustrated. All infrequent itemsets in each level are denoted by dot-
ted, and bordered rectangles. For example, an itemset ab is generated by joining b to
a . The tidlist of ( a )is
. We can determine the
support of ab by intersecting the two tidlists to obtain the tidlist
{
1, 2, 3, 5
}
, and the tidlist of b is
{
1, 2, 4, 5
}
of these can-
didates. Therefore, the support of ab is given by the length of this tidlist , which is 3.
Further gains may be obtained with the use of the notion of diffsets [ 72 ]. This
approach realizes the true power of vertical pattern mining. The basic idea, in diffsets
is to maintain only the portion of the tidlists at a node, that correspond to the change in
the inverted list from the parent node. Thus, the tidlists at a node can be reconstructed
by examining the tidlists at the ancestors of a node in the tree. The major advantage
{
1, 2, 5
}
Search WWH ::




Custom Search