Database Reference
In-Depth Information
boundaries will be missed. The algorithm attempts to minimize these lost subgraphs
by assigning frequent edges large weights before partitioning, leading to fewer cuts
across them.
MapReduce offers a framework for developing parallel programs that operate
gracefully using secondary non-volatile storage. Secondary storage devices, such as
hard disk drives, are almost always larger than the amount of random access memory
available on a machine. MapReduce-based Pattern Finding (MRPF) by Liu et al. [ 34 ]
is a MapReduce framework for the FGM problem that assumes
is a single graph.
It employs a candidate generation scheme that uses two iterations of MapReduce to
explore each level of the search lattice. In the first iteration, the MAP phase generates
size- k candidates. The following REDUCE phase detects duplicates and removes them.
Finally, the second MapReduce pass computes the support of new candidates. Lu
et al. [ 35 ] developed MapReduce based Frequent Subgraph Extraction (MRFSE), a
candidate generation MapReduce implementation for the traditional many-graph
G
G
.
On iteration k ,aMAP task takes one graph from
and emits size- k candidates. Each
REDUCE task takes all of the appearances of a candidate and computes its support.
MRFSE also stores embedding lists to maintain the location of all appearances of
frequent subgraphs in
G
. By doing so, isomorphism tests are no longer necessary, at
the cost of additional memory usage.
G
6.2.2
Work Partitioning
Candidate generation methods consider a different set of candidates at each itera-
tion. Much like the parallel Apriori methods seen in Sect. 4 , the candidate generation
process can easily be decomposed into parallel tasks. Wang and Parthasarathy de-
veloped the MotifMiner Toolkit for distributed memory systems [ 57 ]. MotifMiner is
designed for both the case when
is a single graph. It
has a similar structure as the Data Distribution method from Sect. 4 .
G
is a set of graphs and when
G
is assumed to
be available to all processes and each process is assigned some subset of candidate
generation to perform. Once all processes have generated their local candidate sets,
each process shares its local candidates with all others.
Many parallel candidate generation methods identify the individual candidates to
expand as independent tasks. MRFSE, a MapReduce algorithm by Lu et al. [ 35 ],
assigns instead each graph in
G
as a task. The authors noted that the simple assignment
of the same number of graphs to each MAP task is not a good work partitioning scheme
because graphs in
G
G
can vary greatly in size. Instead, they count the edges of each
graph in
G
and assign tasks such that each MAP is assigned a roughly equal number
of edges.
SP-SUBDUE partitions the single graph
across processes in a distributed set-
ting. Each process then runs an independent instance of SUBDUE and support counts
are finalized by broadcasting the frequent subgraphs to all other processes. When
a process receives a new frequent subgraph from another partition, it computes its
G
Search WWH ::




Custom Search