Database Reference
In-Depth Information
Table 10.2 Serial and parallel frequent itemset mining algorithms
Type
Acronym
Name
Cite
Serial
Apriori
Apriori
[ 1 ]
Serial
ECLAT
Equivalence CLAss Transformation
[ 63 ]
Serial
FP-growth
Frequent Pattern Growth
[ 21 ]
Serial
Partition
Partition
[ 51 ]
Serial
SEAR
Sequential Efficient Association Rules
[ 40 ]
Serial
TreeProjection
TreeProjection
[ 4 ]
Parallel
BigFIM
Frequent Itemset Mining for Big Data
[ 39 ]
Parallel
CCPD
Common Candidate Partitioned Database
[ 66 ]
Parallel
CD
Count Distribution
[ 3 ]
Parallel
CD TreeProjection
Count Distributed TreeProjection
[ 4 ]
Parallel
DD
Data Distribution
[ 3 ]
Parallel
Dist-Eclat
Distributed Eclat
[ 39 ]
Parallel
DPC
Dynamic Passes Combined-counting
[ 33 ]
Parallel
FPC
Fixed Passes Combined-counting
[ 33 ]
Parallel
HD
Hybrid Distribution
[ 19 ]
Parallel
HPA
Hash Partitioned Apriori
[ 52 ]
Parallel
HPA-ELD
HPA with Extremely Large Itemset Duplication
[ 52 ]
Parallel
IDD
Intelligent Data Distribution
[ 19 ]
Serial
IDD TreeProjection
Intelligent Data Distribution TreeProjection
[ 4 ]
Parallel
NPA
Non-Partitioned Apriori
[ 52 ]
Parallel
ParEclat
Parallel Eclat
[ 67 ]
Parallel
Par-FP
Parallel FP-growth with Sampling
[ 9 ]
Parallel
PCCD
Partitioned Candidate Common Database
[ 66 ]
Parallel
PEAR
Parallel Efficient Association Rules
[ 40 ]
Parallel
PPAR
Parallel PARTITION
[ 40 ]
Parallel
SPC
Single Pass Counting
[ 33 ]
the global reduction operation. However, since the hash tree data structure is built
serially by each process, for all candidate itemsets, a bottleneck is realized when
the process count becomes sufficiently high. Also, since each process will have a
local copy of the hash tree corresponding to all candidate itemsets, if the number
of candidate itemsets becomes too large, the hash tree will likely not fit in the main
memory of each process. In this case, the hash tree will have to be partitioned on
disk and the set of transactions scanned once for each partition to compute candidate
itemset supports, a computationally prohibitive exercise in the context of Big Data.
Mueller developed a serial algorithm for FIM, named SEAR [ 40 ], which is identical
to Apriori with the exception of using a trie in place of the hash tree data structure.
Furthermore, Mueller extended SEAR along the lines of the CD algorithm, which
he called PEAR [ 40 ].
Another PFIM algorithm based on the CD algorithm is the Count Distribution
TreeProjection algorithm developed by Agarwal et al. [ 4 ]. Their algorithm is a parallel
extension of their TreeProjection algorithm. In this parallel version of the algorithm,
identical lexicographic trees are built on each process, in place of the hash trees
of the original CD algorithm. From the lexicographic trees, the support counts are
computed and globally reduced. This method shares scalability characteristics with
other CD based algorithms.
 
Search WWH ::




Custom Search