Big Data Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

Table 10.2 Serial and parallel frequent itemset mining algorithms

Type

Acronym

Name

Cite

Serial

Apriori

[ 1 ]

Serial

ECLAT

Equivalence CLAss Transformation

[ 63 ]

Serial

FP-growth

Frequent Pattern Growth

[ 21 ]

Serial

Partition

[ 51 ]

Serial

SEAR

Sequential Efficient Association Rules

[ 40 ]

Serial

TreeProjection

[ 4 ]

Parallel

BigFIM

Frequent Itemset Mining for Big Data

[ 39 ]

Parallel

CCPD

Common Candidate Partitioned Database

[ 66 ]

Parallel

Count Distribution

[ 3 ]

Parallel

CD TreeProjection

Count Distributed TreeProjection

[ 4 ]

Parallel

Data Distribution

[ 3 ]

Parallel

Dist-Eclat

Distributed Eclat

[ 39 ]

Parallel

DPC

Dynamic Passes Combined-counting

[ 33 ]

Parallel

FPC

Fixed Passes Combined-counting

[ 33 ]

Parallel

Hybrid Distribution

[ 19 ]

Parallel

HPA

Hash Partitioned Apriori

[ 52 ]

Parallel

HPA-ELD

HPA with Extremely Large Itemset Duplication

[ 52 ]

Parallel

IDD

Intelligent Data Distribution

[ 19 ]

Serial

IDD TreeProjection

Intelligent Data Distribution TreeProjection

[ 4 ]

Parallel

NPA

Non-Partitioned Apriori

[ 52 ]

Parallel

ParEclat

Parallel Eclat

[ 67 ]

Parallel

Par-FP

Parallel FP-growth with Sampling

[ 9 ]

Parallel

PCCD

Partitioned Candidate Common Database

[ 66 ]

Parallel

PEAR

Parallel Efficient Association Rules

[ 40 ]

Parallel

PPAR

Parallel PARTITION

[ 40 ]

Parallel

SPC

Single Pass Counting

[ 33 ]

the global reduction operation. However, since the hash tree data structure is built

serially by each process, for all candidate itemsets, a bottleneck is realized when

the process count becomes sufficiently high. Also, since each process will have a

local copy of the hash tree corresponding to all candidate itemsets, if the number

of candidate itemsets becomes too large, the hash tree will likely not fit in the main

memory of each process. In this case, the hash tree will have to be partitioned on

disk and the set of transactions scanned once for each partition to compute candidate

itemset supports, a computationally prohibitive exercise in the context of Big Data.

Mueller developed a serial algorithm for FIM, named SEAR [ 40 ], which is identical

to Apriori with the exception of using a trie in place of the hash tree data structure.

Furthermore, Mueller extended SEAR along the lines of the CD algorithm, which

he called PEAR [ 40 ].

Another PFIM algorithm based on the CD algorithm is the Count Distribution

TreeProjection algorithm developed by Agarwal et al. [ 4 ]. Their algorithm is a parallel

extension of their TreeProjection algorithm. In this parallel version of the algorithm,

identical lexicographic trees are built on each process, in place of the hash trees

of the original CD algorithm. From the lexicographic trees, the support counts are

computed and globally reduced. This method shares scalability characteristics with

other CD based algorithms.

Frequent Pattern Mining

Search WWH ::

Custom Search

Home