Database Reference
In-Depth Information
proposed an algorithm that generates a query plan whose cost is bounded by the log
of the total number of variables in the given SPARQL query. An approach for opti-
mizing RDF graph pattern matching by reinterpreting certain join tree structures as
grouping operations have been presented in [81,117]. The proposed approach repre-
sents the intermediate results as sets of groups of triples called TripleGroups and
uses Nested TripleGroup Algebra for manipulating them. Abouzied et al. [4] have
demonstrated an approach for storing and querying RDF data using the HadoopDB
system in conjunction with a column-oriented database [2] that can provide a prom-
ising solution for supporting efficient and scalable semantic web applications. A
similar approach has been presented in [71] where it replaced the column-oriented
backend database with the state-of-the-art of RDF query processors, RDF-3X [106].
Surfer [34] is a large-scale graph-processing engine that is designed to provide
two basic primitives for programmers: MapReduce and Propagation . In this engine,
MapReduce processes different key value pairs in parallel, and propagation is an iter-
ative computational pattern that transfers information along the edges from a vertex
to its neighbors in the graph. In principle, these two primitives are complementary
in graph processing where MapReduce is suitable for processing flat data structures
(e.g., vertex-oriented tasks) while propagation is optimized for edge-oriented tasks
on partitioned graphs. Lattanzi et al. [87] presented a set of MapReduce-based algo-
rithms for a variety of fundamental graph problems such as minimum spanning
trees, maximal matchings, approximate weighted matchings, approximate vertex
and edge covers, and minimum cuts. All of the presented algorithms are parameter-
ized by the amount of memory available on the machines that are used to determine
the number of MapReduce rounds.
In general, graph algorithms can be written as a series of chained MapReduce
invocations that requires passing the entire state of the graph from one stage to the
next. However, this approach is ill-suited for graph processing and can lead to sub-
optimal performance due to the additional communication and associated serialization
overhead in addition to the need of coordinating the steps of a chained MapReduce.
The Pregel system [98] has been introduced by Google as scalable platform for imple-
menting graph algorithms. It relies on a vertex-centric approach, which is inspired
by the Bulk Synchronous Parallel model (BSP) [127], where programs are expressed
as a sequence of iterations, in each of which a vertex can receive messages sent in
the previous iteration, send messages to other vertices, and modify its own state as
well as that of its outgoing edges or mutate graph topology. In particular, Pregel
computations consist of a sequence of iterations, called supersteps . During a super-
step the framework invokes a user-defined function for each vertex, conceptually in
parallel, which specifies the behavior at a single vertex V and a single superstep S . It
can read messages sent to V in superstep S - 1, send messages to other vertices that
will be received at superstep S + 1, and modify the state of V and its outgoing edges.
Messages are typically sent along outgoing edges, but a message may be sent to any
vertex whose identifier is known (Figure 2.19). Similar to the MapReduce framework,
Pregel has been designed to be an efficient, scalable, and fault-tolerant implementa-
tion on clusters of thousands of commodity computers where the distribution-related
details are hidden behind an abstract. It keeps vertices and edges on the machine
that performs computation and uses network transfers only for messages. Hence, the
Search WWH ::




Custom Search