Network Performance Aware Graph Partitioning for Large Graph Processing Systems in the Cloud - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

systems such as GFS [30] or BigTable [17]. Such systems are suitable for process-

ing flat data structures, not just graph structured data. In particular, it is known that

much graph analysis inherently involves random access and direct adoption of the

technologies for flat files or relations may lead to high (network) communication

costs in the cloud. It is desirable to have a graph-processing platform that automati-

cally handles optimization details for users.

Graph processing platforms for the cloud have recently been proposed. Most of

these platforms (e.g., [42,43,81]) are built on top of MapReduce [23]. In this section,

we give a brief survey of some representative solutions.

7.3.1 s urvey oF e Xisting s ystems

7.3.1.1 Pregel

Pregel [58] is a vertex-oriented graph processing engine that implements a Bulk

Synchronous Parallel (BSP) model. Pregel passes computational results between

workers. It provides a user-defined API Compute () executed on vertices. In one itera-

tion of BSP (i.e., superstep in Pregel's terminology), Pregel executes Compute () on

all the vertices in parallel. Messages are passed over the network. Vertices vote to

halt if they have no work to do.

7. 3 .1. 2 P EGASUS

PEGASUS [43] is an open-source Hadoop-based library that supports typical graph

mining operations including PageRank, spectral clustering, diameter and radius esti-

mations, and connected components. An important observation is that many such

mining operations can be readily expressed as an iterative matrix-vector multiplica-

tion. PEGASUS therefore proposes a scalable, highly optimized primitive called

generalized iterated matrix-vector multiplication that includes block multiplication,

clustered edges, and diagonal block iteration.

7.3.1.3 HADI

HADI [42] is a graph mining implementation (developed on Hadoop) that estimates

the radii and diameter of a large graph. To tackle the scale of large graphs, HADI

proposes an approximation algorithm implemented and optimized for the cloud

framework Hadoop/MapReduce.

7.3.1.4 Surfer

Surfer [18] is a large graph processing engine that provides two primitives for devel-

oping applications on the cloud: MapReduce and propagation. MapReduce is useful

for applications processing flat data structures. In comparison, the second primi-

tive propagation operation is designed for developing edge-oriented tasks on large

graphs. A prototype [19] is developed on top of Pregel extended with a network

performance-aware partitioning framework.

7.3.1.5 Trinity

Trinity [67] is a distributed memory-based general purpose graph engine. An obser-

vation is that MapReduce implementation of graph processing can lead to huge I/O

Search WWH ::

Custom Search

Home