Database Reference
In-Depth Information
partitioning and graph processing algorithm aware of the bandwidth unevenness for
networking efficiency.
Foremost, graph partitioning has been a classical combinatorial optimization
problem, with an input objective function. The input objective function is to mini-
mize the number of cross-partition edges with the constraint of all partitions with
similar number of edges. This is because the total number of cross-partition edges
is often a good indicator of the amount of communication between partitions in dis-
tributed computation. It is an NP-complete problem [50].
The network performance aware graph partitioning framework discussed in
this section improves the network performance of graph partitioning process itself.
Moreover, the partitions generated from the framework improve the network per-
formance of graph processing tasks. The basic idea of the framework is to partition,
store, and process the graph partitions according to their numbers of cross-partition
edges such that the partitions with a large number of cross-partition edges are stored
in the machines with high network bandwidth between them, as the network traffic
requirement for those graph partitions is high. To achieve this, the framework parti-
tions both the data graph and a “machine graph” (defined next) simultaneously.
7.5.1 m aChine g raPh
To capture the network bandwidth unevenness, a complete weighted undirected
graph (namely machine graph ) models the machines chosen for graph partitioning.
Each machine is modeled as a vertex; an edge represents the connectivity between
the two machines, and the bandwidth between any two machines is represented as
the weight of an edge. For simplicity, assume that each machine has the same con-
figuration in terms of computation power and main memory. In practice, users usu-
ally acquire the virtual machines of the same type for one application, because of
convenience and management. In addition, an undirected graph is used in the model,
as the bandwidth can often be similar in both directions.
Machine Graph Building. The machine graph can be built without the knowl-
edge or control of the network physical topology, as follows.
Given a set of machines for partitioning, the machine graph can be constructed
by calibrating the network bandwidth between any two machines in the set. The
network bandwidth can be measured by sending a data chunk of 8 MB and using
the average of twenty measurements as the estimated bandwidth. For N virtual
machines, only N iterations of calibrations are needed to measure all pairwise per-
formance. In each iteration, N
2 machine pairs are calibrated. The maintenance is
based on the classic exponential average by getting the bandwidth of data transfer in
the graph processing.
Example
The left part of Figure 7.2a illustrates the machine graph for four machines in a
cluster with tree topology. The edge thickness represents the weight: a thicker
edge means a link with higher bandwidth. The example cluster consists of two
Search WWH ::




Custom Search