Database Reference
In-Depth Information
systems such as GFS [30] or BigTable [17]. Such systems are suitable for process-
ing flat data structures, not just graph structured data. In particular, it is known that
much graph analysis inherently involves random access and direct adoption of the
technologies for flat files or relations may lead to high (network) communication
costs in the cloud. It is desirable to have a graph-processing platform that automati-
cally handles optimization details for users.
Graph processing platforms for the cloud have recently been proposed. Most of
these platforms (e.g., [42,43,81]) are built on top of MapReduce [23]. In this section,
we give a brief survey of some representative solutions.
7.3.1 s urvey oF e Xisting s ystems
7.3.1.1 Pregel
Pregel [58] is a vertex-oriented graph processing engine that implements a Bulk
Synchronous Parallel (BSP) model. Pregel passes computational results between
workers. It provides a user-defined API Compute () executed on vertices. In one itera-
tion of BSP (i.e., superstep in Pregel's terminology), Pregel executes Compute () on
all the vertices in parallel. Messages are passed over the network. Vertices vote to
halt if they have no work to do.
7. 3 .1. 2 P EGASUS
PEGASUS [43] is an open-source Hadoop-based library that supports typical graph
mining operations including PageRank, spectral clustering, diameter and radius esti-
mations, and connected components. An important observation is that many such
mining operations can be readily expressed as an iterative matrix-vector multiplica-
tion. PEGASUS therefore proposes a scalable, highly optimized primitive called
generalized iterated matrix-vector multiplication that includes block multiplication,
clustered edges, and diagonal block iteration.
7.3.1.3 HADI
HADI [42] is a graph mining implementation (developed on Hadoop) that estimates
the radii and diameter of a large graph. To tackle the scale of large graphs, HADI
proposes an approximation algorithm implemented and optimized for the cloud
framework Hadoop/MapReduce.
7.3.1.4 Surfer
Surfer [18] is a large graph processing engine that provides two primitives for devel-
oping applications on the cloud: MapReduce and propagation. MapReduce is useful
for applications processing flat data structures. In comparison, the second primi-
tive propagation operation is designed for developing edge-oriented tasks on large
graphs. A prototype [19] is developed on top of Pregel extended with a network
performance-aware partitioning framework.
7.3.1.5 Trinity
Trinity [67] is a distributed memory-based general purpose graph engine. An obser-
vation is that MapReduce implementation of graph processing can lead to huge I/O
Search WWH ::




Custom Search