Database Reference
In-Depth Information
support in a single graph is that some embeddings of a subgraph G could be overlap-
ping. Kuramochi and Karypis [ 30 ] show that, if overlapping subgraphs are counted,
the problem is no longer downward closed. Thus, they only consider edge-disjoint
embeddings, i.e., those embeddings of G that do not have any shared edges in
.
SUBDUE, by Holder et al. [ 24 ], operates on a single graph and was one of the first
developed FGM tools. It uses a greedy candidate generation approach and bounds
the cost of doing isomorphism checks by doing inexact graph matches. Due to its
greedy nature and inexact matches, SUBDUE is an approximate algorithm that does
not find the complete set of frequent subgraphs. Kuramochi and Karypis [ 30 ] also
addressed the scenario of
G
being a single graph. They presented two algorithms,
Horizontal Single Graph Miner (HSIGRAM) and Vertical Single Graph Miner (VSI-
GRAM), which are candidate generation and pattern growth methods, respectively.
HSIGRAM and VSIGRAM are complete FGM methods that also have the ability to
do approximate counting. In order to reduce the time for frequency counting on each
lattice level, HSIGRAM makes use of a modified form of embedding lists: instead of
storing each embedding found in
G
, a single edge called the anchor-edge is stored.
This significantly reduces the memory expense of full embedding lists at the cost of
some recomputation for each embedding.
G
6.2
Parallel Frequent Graph Mining
The exponential cost of subgraph isomorphism inherent in FGM makes parallelism
for this problem vital for Big Data input. We now describe existing parallel FGM
(PFGM) methods, focusing on how they address the issues of memory scalability,
work partitioning, and dynamic load balancing.
6.2.1
Memory Scalability
Many of the PFIM and PFSM techniques for achieving memory scalability are also
applicable in PFGM algorithms. Methods commonly require access to the entirety
of
. This has performance implications for both shared and distributed memory
systems. Meinl et al. [ 37 ] created parallel versions of gSpan and MoFa for a shared
memory system. In both algorithms,
G
is globally accessible to all processes, but
could be split across multiple machines, causing unexpected network delays. Ad-
ditionally, MoFa must be able to keep track of the already generated subgraphs in
order to avoid generating duplicates.
Cook et al. [ 11 ] developed Static Partitioning SUBDUE (SP-SUBDUE), a par-
allel, message-passing implementation of SUBDUE. Like its serial predecessor,
SP-SUBDUE expects
G
G
to be a single large graph. It uses a weighted graph par-
titioning algorithm [ 28 ] to divide
G
into as many partitions as there are processes.
Each process can then store only their local portion of
G
in memory. The conse-
quence of partitioning
G
is that any frequent subgraphs that exist across partition
Search WWH ::




Custom Search