Big Data Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

Table 10.4 Serial and parallel frequent graph mining algorithms

Type

Acronym

Name

Cite

Serial

AGM

Apriori-based Graph Mining

[ 25 ]

Serial

FSG

Frequent Subgraph Mining

[ 29 ]

Serial

gSpan

Graph-based Substructure Pattern Mining

[ 60 ]

Serial

HSIGRAM

Horizontal Single Graph Miner

[ 30 ]

Serial

MoFa

Molecular Fragment Miner

[ 6 ]

Serial

SUBDUE

Substructure Discovery

[ 24 ]

Serial

VSIGRAM

Vertical Single Graph Miner

[ 30 ]

Parallel

p-MoFa

Parallel MoFa

[ 37 ]

Parallel

p-gSpan

Parallel gSpan

[ 37 ]

Parallel

d-MoFa

Distributed MoFa with Dynamic Load Balancing

[ 14 ]

Parallel

MotifMiner

MotifMiner Toolkit

[ 57 ]

Parallel

MRFSE

MapReduce-based Frequent Subgraph Extraction

[ 35 ]

Parallel

MRPF

MapReduce-based Pattern Finding

[ 34 ]

Parallel

SP-SUBDUE

Static Partitioning SUBDUE

[ 11 ]

Parallel

SP-SUBDUE-2

Static Partitioning SUBDUE 2

[ 47 ]

6

Frequent Graph Mining

Successful algorithms for frequent graph mining (FGM) are often directly related to

those designed for FIM. In this section, we provide a brief overview of some key

serial methods for FGM and then discuss strategies and challenges for parallelizing

FGM. For easy reference, Table 10.4 lists the serial and parallel methods described

in this section.

6.1

Serial Frequent Graph Mining

Frequent graph mining comes with an added computational expense that is not present

in frequent itemset or sequence mining. Determining if one graph is a subgraph of

another is an NP-complete problem, known as the subgraph isomorphism problem .

The cost of pruning infrequent candidates by performing isomorphism checks is very

costly and thus most FGM methods make an effort to reduce the amount of pruning

necessary.

Inokuchi et al. [ 25 ] developed Apriori-based Graph Mining (AGM), which ex-

tended the Apriori algorithm to FGM. During candidate generation, subgraphs are

extended by joining two frequent subgraphs and expanding by one edge in all pos-

sible ways. Kuramochi and Karypis [ 29 ] described the Frequent Subgraph mining

algorithm (FSG), another candidate generation based method. Size- k candidates are

generated by joining two frequent ( k

−

1)-subgraphs. Subgraphs are joined only if

they share the same ( k

2)-subgraph. When candidates are joined, an edge is grown

by either connecting two existing vertices or adding a new vertex. Infrequent sub-

graphs must be pruned after all size- k candidates are generated. FSG optimizes the

pruning stage by only adding edges during candidate generation which are known to

−

Frequent Pattern Mining

Search WWH ::

Custom Search

Home