Network Performance Aware Graph Partitioning for Large Graph Processing Systems in the Cloud - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

the WWW graph is the computation of web pages' PageRank [63] for web searches.

Let the engineering details such as damping factor alone, the PageRank algorithm

iteratively computes the PageRank of each page from the PageRanks of the pages

that link to it. The algorithm terminates when the PageRanks of the pages converge.

The PageRank algorithm is often used as an example to illustrate performance char-

acteristics of cloud-based platforms.

7.2.3 i nFormation n etworks

Resource Description Framework (RDF) has been an official W3C recommendation for

the semantic web. The triplets of RDF naturally form a graph. Among others, RDF has

been applied to knowledge bases, such as DBpedia [6]. The ontology of DBpedia derived

from Wikipedia contains 3.7 millions of “things” and 400 millions of facts.* Such data

are particularly useful for users to formulate complex queries about the information rep-

resented in the RDF. Applications of the semantic web continue to emerge each year [1].

Search engine providers are actively engaged in introducing semantics for next

generation search engines (e.g., Probase [78]). † A recent report of the graph-based

knowledge base Satori [13] from Microsoft, which enhances the search capabilities

of Bing, consists of more than 300 million nodes and 800 million edges. Google's

knowledge graph has 570 million objects and 18 billion facts about the relationships

between different objects. The knowledge graphs are expected to enhance the rank-

ing mechanisms of search results.

7.2.4 m isCellaneous

Other examples of large graphs are the citation relationship of research articles, rela-

tionships between US patents, ‡ Wordnet, § communication networks, transportation

or road networks, and many others. Some of these graphs can be found in the a nice

collection of graphs of the Stanford Network Analysis Project (SNAP) [53].

7.3 CLOUD-BASED GRAPH PROCESSING PLATFORMS

As described in the previous section, graph data are ubiquitous and their volume is

ever increasing. New computationally and data-intensive analysis tasks on graphs

are continuously being reported. The deployments of applications on such data have

been moving from a small number of high-performance servers or super computers

[31,46] toward a cloud with a large number of commodity servers [43,58].

A number of general-purpose development platforms such as MapReduce [23], its

open-source variant, Hadoop [33], and Dryad [37] have been proposed to help users

to develop custom applications on the cloud, without worrying about the complexity

beneath the cloud. For instance, data may be stored in distributed and replicated file

* DBpedia SPARQL Benchmark: http://aksw.org/Projects/DBPSB.html.

† Probase: http://research.microsoft.com/probase/.

‡ US Patent: http://vlado.fmf.uni-lj.si/pub/networks/data/patents/Patents.htm.

§ Wordnet: http://vlado.fmf.uni-lj.si/pub/networks/data/dic/Wordnet/Wordnet.zip.

Search WWH ::

Custom Search

Home