Database Reference
In-Depth Information
Beyond machine-level parallelism, it is desirable to exploit intra-machine parallelism.
On multicore CPUs, parallel libraries like MTGL [12] have been developed for par-
allel graph algorithms. MTGL offers a set of data structures and APIs for building
graph algorithms. The MTGL API is modeled after the Boost Graph Library [69] and
optimized to leverage shared memory multithreaded machines. The SNAP framework
[7] provides a set of algorithms and building blocks for graph analysis, especially for
small-world graphs. On the GPU, a general-purpose programming framework called
Medusa [80] has been developed. The goal is to hide the details of graph programming
and GPU runtime from users. In contrast to Pregel, Medusa adopts very fine-grained
processing on vertices/edges/messages to exploit the massive parallelism of the GPU.
Additionally, there are specific parallel graph algorithms on the GPU [34,36,48,75].
7.4 UNEVEN BANDWIDTH BETWEEN THE
MACHINES OF THE CLOUD
The cloud-based solutions discussed in the previous section provide a user-friendly plat-
form for users to develop their custom logic without worrying how the underlying inter-
connected machines operates. However, the unique network environment that consists
such number of servers does further add fuel to the challenges of large graph processing.
In this section, we discuss the factors on the cloud (such as hardware and software) that
reveal the major factors of network bandwidth unevenness in the cloud.
7.4.1 F aCtor 1: n etwork e nvironment
Due to the significant scale, the cloud network environment is significantly different
from those in previous distributed environment [44,46,52], for example, Cray super-
computers or a small-scale cluster. In a small-scale cluster, the network bandwidth is
often roughly the same for every machine pair. However, the network bandwidth of
the cloud environment is uneven among different machine pairs.
Current cloud infrastructures often use a switch-based tree structure to intercon-
nect the servers [10,32,41]. Machines are first grouped into pods , and then pods are
connected to higher-level switches. A natural consequence of such a topology is
that the network bandwidth of any machine pair is not uniform that is influenced by
the switches that connect the two machines [37]. The intra-pod bandwidth is much
higher than the cross-pod bandwidth.
The knowledge of network topology (such as multilevel data reduction along
the tree topology [23] and partition-based locality optimizations [64]) and schedul-
ing techniques [38] are crucial for advanced optimization in the cloud. However, it
should also be remarked that the topology information in the cloud is usually not
available to cloud users due to the virtualization and system management issues.
Finally, a simple reason for network unevenness can be that the commodity com-
puters in the cloud may not have a uniform network configuration (e.g., network
adaptors). As the cloud evolves, its computers may become heterogeneous from gen-
erations to generations [79]. For example, current mainstream network adaptors pro-
vide 1 Gb/sec, and the adaptors with 10 Gb/sec has been gradually employed. These
Search WWH ::




Custom Search