Database Reference
In-Depth Information
Processors
Local computation
Superstep
Communication
Barrier
synchronization
FIGURE 2.19
BSP programming model.
model is well suited for distributed implementations as it doesn't expose any mecha-
nism for detecting order of execution within a superstep, and all communication is
from superstep
S
to superstep
S
+ 1. The ideas of Pregel have been cloned by many
open-source projects such as
GoldenOrb
,*
Apache Hama
,
†
and
Apache Giraph
.
‡
Both of Hama and Giraph are implemented to be launched as a typical Hadoop
job that can leverage the Hadoop infrastructure. Other large-scale graph processing
systems that have been introduced that neither follow the MapReduce model nor
leverage the Hadoop infrastructure include
GR ACE
[130],
GraphLab
[96,97], and
Signal/Collect
[122].
The
Dedoop
system (
De
duplication with Ha
doop
) [82,83] has been presented as
an entity resolution framework based on MapReduce. It supports the ability to define
complex entity resolution workflows that can include different matching steps and/
or apply machine learning mechanisms for the automatic generation of match classi-
fiers. The defined workflows are then automatically translated into MapReduce jobs
for parallel execution on Hadoop clusters. The
MapDupReducer
[129] is another
system that has been proposed as a MapReduce-based solution, which is developed
for supporting the problem of near duplicate detection over massive data sets using
the
PPJoin
(
P
ositional and
P
reix filtering) algorithm [132].
An approach to efficiently perform set-similarity joins in parallel using the
MapReduce framework has been proposed by Vernica et al. [128]. In particular, they
propose a three-stage approach for end-to-end set-similarity joins. The approach
takes as input a set of records and outputs a set of joined records based on a set-
similarity condition. It partitions the data across nodes to balance the workload
and minimize the need for replication. J. Lin [92] has presented three MapReduce
algorithms for computing pairwise similarity on document collections. The first
*
http://goldenorbos.org/.
†
http://hama.apache.org/.
‡
http://giraph.apache.org/.
Search WWH ::
Custom Search