Database Reference
In-Depth Information
of minimizing the weight of cross-partition edges in the machine graph corre-
sponds to minimizing the number of cross-partition edges in the data graph. This
is a graceful adaptation on assigning the network bandwidth to partitions with
different number of cross-partition edges.
7.5.3.5 Partition Numbers
A minor detail is that it is preferable to make partitions with the roughly same
number of machines is for load-balancing purpose, since partitions in the data
graph also have similar sizes. On the other hand, the number of partitions returned
by the algorithm may also be specified by the user. A reasonable choice is to deter-
mine P so that each graph partition can fit into the main memory of a machine.
This is to avoid the significant performance degradation due to the random disk
I/O in graph processing.
Finally, the partitioning algorithm discussed in this section satisfies the three
design principles: (1) the number of cross-partition edges is gradually adapted to
the network bandwidth. In each bisection of the recursion, the cut with the mini-
mum number of cross-partition edges in the data graph coincides that with minimum
aggregated bandwidth in the machine graph. (2) The partition size is tuned accord-
ing to the amount of main memory available to reduce the random disk accesses.
(3)  In the iteration, the proximity among partitions in the machine graph matches
that in the data graph.
7.6 HIERARCHICAL COMBINATION OF EXECUTION
With partitioned graph, the graph execution model may exploit data locality to
reduce network traffic in data-intensive computing systems [23,37]. The basic idea is
to apply a Combine () function (i.e., Combiner in Pregel), and perform partial merg-
ing of the intermediate data before they are sent over the network. Combination is
applicable when the combination function is annotated as an associative and com-
mutative function.
A basic approach is local combination . Current cloud-based graph engines like
Pregel and Trinity support this basic approach. For all the graph partitions on a
machine, one may apply the local combination on the boundary vertices belonging
to the same remote partition and send the combined intermediate results back to the
local partition for further processing.
Local combination is not aware of the network bandwidth unevenness in the cloud
network environment. This motivates the approach of hierarchical combination . In
local combination, it requires network data transfer for the boundary vertices of the
graph partition. Due to the irregular graph structures, the source vertices are likely
to be scattered on many different machines. Thus, many data transfers are performed
on the relatively low-bandwidth machine pairs caused by the network bandwidth
unevenness. Therefore, instead of direct data transfers after local combination, one
may exploit the machine graphs for local combination as follows.
The data of the source vertices can be combined among the machines with high
bandwidth before sending them to the target machine via the connections with low
bandwidth. Hierarchical combination applies this idea in multiple levels according to
Search WWH ::




Custom Search