Database Reference
In-Depth Information
Once a problem is modeled as a graph, it can be distributed over machines in
a distributed system using a graph partitioning technique . Graph partitioning
implies dividing the work (i.e., the vertices) over distributed nodes for efficient
distributed computation. As is the case with data parallelism, the basic idea is
simple; by distributing a large graph across multiple machines, it becomes pos-
sible to process different parts of the graph in parallel. As such, graph partitioning
enables what we refer to as graph parallelism . The standard objective of graph
partitioning is to uniformly distribute the work over p processors by partitioning
the vertices into p equally weighted partitions, while minimizing inter-node com-
munication reflected by edges. Such an objective is typically referred to as the
standard edge cut metric [34]. The graph partitioning problem is NP-hard [21],
yet heuristics can be implemented to achieve near optimal solutions [34,35,39].
As a specific example, Figure 1.11 demonstrates three partitions, P 1 , P 2 , and P 3
at which vertices v 1 ,..., v 8 are divided using the edge cut metric. Each edge has a
weight of 2 corresponding to 1 unit of data being communicated in each direction.
Consequently, the total weight of the shown edge cut is 10. Other cuts will result
in more communication traffic. Clearly, for communication-intensive applications,
graph partitioning is very critical and can play a dramatic role in dictating the
overall application performance. We discuss some of the challenges pertaining to
graph partitioning in Section 1.5.3.
As real examples, both Pregel and GraphLab employ graph partitioning.
Specifically, in Pregel each vertex in a graph is assigned a unique ID, and partition-
ing of the graph is accomplished via using a hash(ID) mod N function, where N is
the number of partitions. The hash function is customizable and can be altered by
users. After partitioning the graph, partitions are mapped to cluster machines using
a mapping function of a user choice. For example, a user can define a mapping func-
tion for a Web graph that attempts to exploit locality by co-locating vertices of the
same Web site (a vertex in this case represents a Web page). In contrast to Pregel,
GraphLab utilizes a two-phase partitioning strategy. In the first phase, the input
P 1
v 1
v 3
A minimum cut with a
total weight of 10
v 2
2
2
2
2
P 3
P 2
2
v 4
v 7
v 8
v 5
v 6
FIGURE 1.11
A graph partitioned using the edge cut metric.
Search WWH ::




Custom Search