Database Reference
In-Depth Information
8.6.4 Static Patterns ................................................................................... 275
8.6.4.1 Diameter............................................................................. 275
8.6.4.2 Shape of Distribution ......................................................... 278
8.6.4.3 Radius Plot of GCC ........................................................... 280
8.6.4.4 “Core” and “Whisker” Vertices ......................................... 280
8.6.5 Temporal Patterns ............................................................................. 282
8.7 Conclusion .................................................................................................... 283
References .............................................................................................................. 284
8.1 INTRODUCTION
The scale of graph data that is nowadays collected and required to be processed is mas-
sive. For example, in the context of online services, the web graph amounts to at least
1 trillion of links [1]. Facebook recently reported more than 1 billion of users and 140
billion of friend connections [2], and Twitter reported in 2009 more than 40 million
of users and about 1.5 billion of social relations [30]. The unprecedented proliferation
of data provides us with new opportunities and benefits but also poses hard compu-
tational challenges. Frequent graph computations such as community detection [25],
finding connected components [29], iterative computations using graph input data such
as computing PageRank and its variations [41], and shortest path and radius computa-
tions [17,28] become challenging computational tasks in the realm of big graph data.
In this chapter, we describe PEGASUS, an open-source Peta Graph Mining library,
which performs typical graph mining tasks such as computing the diameter of a graph,
computing the radius of each node, finding the connected components, and computing
the importance score of nodes. The main idea behind PEGASUS is to capitalize on
matrix-vector multiplication as a main primitive for the software engineer. Inspired by
the work of [47], which showed that triangles can be estimated by few matrix-vector
multiplications, PEGASUS introduces a set of different operators that solve a variety
of graph mining tasks together with an optimized implementation of matrix-vector
multiplications in MapReduce. PEGASUS is a solid engineering effort that allows us
to manipulate large-scale graphs. Since the introduction of PEGASUS, other large-
scale graph processing systems have been introduced, among them Google's Pregel
[36], LinkedIn's Giraph [3], and GraphLab [34]. It is worth mentioning that Giraph
uses several algorithms and ideas from PEGASUS, including the connected compo-
nents algorithm. Also, PEGASUS has been included in Hadoop for Windows Azure
[4]. PEGASUS provides us with the ability to investigate the structure of large-scale
graphs. This allows us to detect outliers and obtain significant structural patterns.
This chapter is organized as follows: Section 8.2 presents related work. Section 8.3
presents the proposed method. Section 8.4 presents Hadoop implementations and Section
8.5 timings. Section 8.6 shows findings of PEGASUS in several real-world networks.
8.2 RELATED WORK
In Section 8.2.1, we discuss two established patterns, the bow-tie structure of the web
and the six degrees of separation theory. In Section 8.2.2, we provide basic graph theo-
retic definitions for our work. In Section 8.2.3, we discuss work related to computing the
Search WWH ::




Custom Search