Network Performance Aware Graph Partitioning for Large Graph Processing Systems in the Cloud - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

and communication overhead. Trinity exploits the memory of the machines in the

cloud forming a “memory cloud,” which enables fast random data access, which is

particularly useful for computation on graphs. In addition, Trinity consists of a native

graph storage engine. These techniques significantly speed up large graph process-

ing. Trinity supports both transactional and batched graph processing.

7.3.1.6 GraphLab

GraphLab [56] is specially designed for machine learning and data mining algorithms,

which are not naturally supported by MapReduce. The GraphLab abstraction enables

developers to specify asynchronous, dynamic, graph-parallel computation while ensur-

ing data consistency and achieving a high degree of parallel performance in the shared-

memory setting. GraphLab uses an asynchronous parallel model different from the BSP

model used by Pregel. Additionally, The GraphLab framework has been extended to the

distributed setting while preserving strong data consistency guarantees [55].

Other cloud-based solutions for graph processing include the following. DisG [81]

is an ongoing project for web graph reconstruction using Hadoop. Pujol et al. [65]

studied different replication methods to scale social network analysis. Hama [5] and

Giraph [4] are two open-source projects targeting large graph processing. They adopt

Pregel's programming model and their storage is built on top of the Hadoop Distributed

File System. While the solutions mentioned above focus on batch processing, there are

transactional graph processing databases such as Neo4j and InfiniteGraph. Finally,

recently, a number of cloud-based data management systems have been developed for

other important workloads such as data warehousing [2,35,77] and on-line transaction

processing [22], which are beyond the scope of this chapter.

7.3.2 C omParison oF e Xisting s ystems

Table 7.1 provides a brief comparison of a number of representative graph processing

systems with respect to their properties of graph storage, support of online process-

ing, main-memory processing and distributed processing. Neo4j and HyperGraphDB

TABLE 7.1

Comparison of Representative Systems (An Extended Version Based on

Table 2 in Previous [68])

Online Query

Processing

Memory-Based

Exploration

Distributed Parallel

Processing

Native Graphs

Neo4j

Ye s

No

HyperGraphDB

No

Ye s

No

InfiniteGraph

Ye s

No

Ye s

MapReduce

No

Ye s

PEGASUS

No

Ye s

Surfer

Ye s

No

Ye s

Googles Pregel

No

Ye s

Microsofts Trinity

Ye s

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home